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Description 

[0001] The present invention relates to a method and an apparatus for sharing memory among coherence domains 
of computer systems. 

5 [0002] The sharing of memory among multiple coherence domains presents unique coherence problems. To facilitate 
a discussion of these coherence problems, Fig. 1 shows a computer node 1 00 representing, e.g., a computer node in 
a more complex computer system. Within computer node 100, there are shown a plurality of processing nodes 102, 
104, and 106 coupled to a common bus 108. Each of processing nodes 102, 104, and 106 represents, for example, a 
discrete processing unit that may include, e.g., a processor and its own memory cache. The number of processing 

10 nodes provided per computer node 1 00 may vary depending on needs, and may include any arbitrary number although 
only three are shown herein for simplicity of illustration. 

[0003] Within computer node 1 00, a common bus 1 08 is shown coupled to a memory module 1 1 0, which represents 
the memory space of computer node 100 and may be implemented using a conventional type of memory such as 
dynamic random access memory (DRAM). Memory module 110 is typically organized into a plurality of uniquely ad- 

15 dressable memory blocks 112. Each memory block of memory module 110, e.g., memory block 112(a) or memory 
block 112(b), has a local physical address (LPA) within computer node 100, i.e., its unique address maps into the 
memory space of computer 100. Each memory block 112 represents a storage unit for storing data, and each may be 
shared among processing nodes 1 02, 104, and 106 via common bus 108. Of course, there may be provided as many 
memory blocks as desired to satisfy the storage needs of computer node 1 00. In some cases, many memory modules 

20 no may be provided by computer node 1 00. 

[0004] As is known to those skilled in the art, computer processors, e.g., processor 116 within processing node 102, 
typically operates at a faster speed than the speed of the memory module 110. To expedite access to the memory 
blocks 1 1 2 of memory module 1 1 0, there is usually provided with each processing node, e.g., processing node 1 02, a 
memory cache 114. A memory cache, e.g., memory cache 114, takes advantage of the fact that a processor, e.g., 

25 processor 116, is more likely to reference memory addresses that it recently references than other random memory 
locations. Further, memory cache 1 1 4 typically employs faster memory and tends to be small, which further contributes 
to speedy operation. 

[0005] Within memory cache 114, there exists a plurality of block frames 118 for storing copies of memory blocks, 

e.g., memory blocks 112. Each block frame 118 has an address portion 120 for storing the address of the memory 
30 block it cached. If the unique address of memory block 112(a) is, e.g., FF5h, this address would be stored in address 

portion 1 20 of a block frame 1 1 8 when memory block 1 1 2(a) of memory module 1 1 0 is cached into memory cache 114. 

There is also provided in block frame 118 a data portion 122 for storing the data value of the cached memory block. 

For example, if the value stored in memory block 112(a) was 12 when memory block 112(a) was cached into block 

frame 118, this value 12 would be stored in data portion 122 of block frame 118. 
35 [0006] Also provided in block frame 118 is a status tag 124 for storing the state of the memory block it cached. 

Examples of such states are, e.g., gM, gS, and gl, representing respectively global exclusive, global shared, and global 

invalid. The meanings of these states are discussed in greater detail herein, e.g., with reference to Fig. 4. 

[0007] A processing node may hold an exclusive copy of a memory block in its cache when it is the only entity having 

a valid copy. Such exclusive copy may potentially be different from its counterpart in memory module 110, e.g., it may 
40 have been modified by the processing node that cached it. Alternatively, a processing node may possess a shared, 

read-only copy of a memory block. When one processing node, e.g., processing node 102, caches a shared copy of 

a memory block, e.g., memory block 112(a), other processing nodes, e.g., processing nodes 104 and 106, may also 

possess shared copies of the same memory block. 

[0008] If a memory block is never cached in a processing node or it was once cached but is no longer cached therein, 
45 that processing node is said to have an invalid copy of the memory block. No valid data is contained in the block frame 
when the state associated with that block frame is invalid. 

[0009] The coherence problem that may arise when memory block 112 is shared among the processing nodes of 
Fig. 1 will now be discussed in detail. Assuming that processing node 102 caches a copy of memory block 112(a) into 
its memory cache 114 to change the value stored in memory block 112 from 12 to 13. Typically, when the value is 
50 changed by a processing node such as processing node 1 02, that value is not updated back into memory module 1 1 0 
immediately. Rather, the updating is typically performed when memory cache 114 of processing node 102 writes back 
the copy of memory block 112(a) it had earlier cached. 

[001 0] Now suppose that before memory cache 1 1 4 has a chance to write back the changed value of memory block 
112(a), i.e., 13, into memory module 110, processing node 104 wishes to reference memory biock 112(a). Processing 
55 node 1 04 would first ascertain in its own memory cache 1 32 to determine whether a copy of memory block 1 1 2(a) had 
been cached therein earlier. Assuming that a copy of memory block 1 1 2(a) has never been cached by processing node 
1 04, a cache miss would occur. 

[001 1] Upon experiencing the cache miss, processing node 1 04 may then proceed to obtain a copy of memory block 
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112(a) from memory module 110. Since the changed value of memory block 112(a) has not been written back into 
memory module 110 by processing node 102, the old value stored in memory block 112(a), i.e., 12, would be acquired 
by processing node 1 04. This problem is referred to herein as the coherence problem and has the potential to provide 
erroneous values to processing nodes and other devices that share a common memory. 

s [0012] Up to now, the sharing of memory blocks 112 is illustrated only with reference to devices internal to computer 
node 100, i.e., devices such as processing nodes 102, 104, and 106 that are designed to be coupled to common bus 
1 08 and communicate thereto employing the same communication protocol. There may be times when it is necessary 
to couple computer node 100 to other external devices, e.g., to facilitate the expansion of the computer system. Of- 
tentimes, the external devices may employ a different protocol from that employed on common bus 108 of computer 

10 node 1 00 and may even operate at a different speed. 

[0013] External device 140 of Fig. 1 represents such an external device. For discussion purposes, external device 
1 40 may represent, for example, an I/O device such as a gateway to a network. Alternatively, external device 1 40 may 
be, for example, a processor such as a Pentium Pro™ microprocessor (available from Intel. Corp. of Santa Clara, 
California), representing a processor whose protocol and operating speed may differ from those on common bus 108. 

15 As a further example, external device 140 may represent a distributed shared memory agent for coupling computer 
node 100 to other entities having their own memory spaces, e.g., other computer nodes having their own memory 
modules. Via the distributed shared memory agent, the memory blocks within computer node 100 as well as within 
those other memory-space-containing entities may be shared. 

[0014] Although an external device may need to share the data stored in memory module 110, it is typically not 
20 possible to couple an external device, such as external device 140, directly to common bus 108 to allow external device 
140 to share the memory blocks in memory module 110. The direct coupling is not possible due to, among others, the 
aforementioned differences in protocols and operating speeds. 

[0015] An article in Computer Architecture News, Volume 24, No. 2, May 1996, pages 308-31 7 by LovettT, et al and 
entitled "Sting: A CC-NUMA Computer System for the Commercial Market Place 0 , describes a cache coherent non- 
25 uniform memory access (CC-NUMA multi-processor). Four processor symmetric multi-processor (SMP) nodes use a 
scalable coherent interface (SO)-based coherent interconnect. 

The individual SMP nodes include multiple processors and memory connected via a common bus. A bridge board at 
each SMP node implements coherency of local and remote caches using a directory-based cache protocol. A bus- 
side local directory contains two bits of state information for each block in a local memory. Bus-side remote tags provide 
30 snooping information for lines in a remote cache. A network-side local memory directory is also maintained and network- 
side remote tags are used in the directory-based protocol. 

[001 6] US Patent US-A-5 ,522,058 describes a distributed shared memory multi- processor system capable of reduc- 
ing traffic on a shared bus, without imposing constraints concerning the type of variables to be accessed in parallel 
programs. A plurality of processor units are coupled through a shared bus, with each processor comprising a CPU, a 

35 main memory connected with the CPU through an internal bus, a cache memory associated with the CPU, and the 
sharing management unit connected with the main memory and the cache memory through the internal bus. The 
sharing management unit includes a main memory tag memory for storing information as to whether each line of the 
main memory is present in the cache memory, a cache address tag memory for storing addresses estimated to be 
stored in an address tag memory and a cache state memory for storing the estimated cache state of cache memory. 

40 An internal access control unit controls accesses on the internal bus, an external address unit controls access on a 
shared bus, a main memory tag memory control unit controls read-out and updating of the main memory tag memory 
and a cache state memory control unit controls the read out and updating of the cache state tag memory. 
[001 7] In view of the foregoing, what is needed is an improved method and apparatus for permitting memory blocks 
having a local physical address (LPA) in a particular computer node to be shared, in an efficient and error-free manner, 

45 among interconnected entities such as other processing nodes and external devices. 
[0018] An aspect of the present invention provides a method as set forth in claim 1 . 
[001 9] Another aspect of the invention provides coherence transformer as set forth in claim 1 6. 
[0020] Embodiments of the invention enable the efficient solving of coherence problems when memory blocks having 
local physical addresses (LPA) in a particular computer node of a computer system are shared by other nodes of the 

so system as well as by external entities coupled to that computer node. 

[0021] The invention will now be described by way of example with reference to the accompanying drawings, through- 
out which like parts are referred to by like references, and in which: 

Fig. I shows, for discussion purposes, a computer node representing, e.g., a computer node in a more complex 
55 computer system. 

Fig. 2 shows, in accordance with one aspect of the present invention, a coherence transformer block. 
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Fig. 3 shows, in accordance with one aspect of the present invention, the memory blocks and their associated 
memory tags (Mtags). 

Fig. 4 shows, in one embodiment of the present invention, the various available states that may be stored in a Mtag. 

5 

Fig. 5 shows in greater detail, in accordance with one aspect of the present invention, the format of a typical 
memory access request on common bus 108. 

Fig. 6 shows in greater detail, in accordance with one aspect of the present invention, the format of a typical 
io response to the request of Fig. 5. 

Fig. 7A shows, in one embodiment, the functional units within the coherence transformer. 

Fig. 7B illustrates, in one embodiment, some of the external states tracked by the coherence transformer. 

15 

Fig. 8 illustrates, in accordance with one aspect of the present invention, the steps taken by the coherence trans- 
former Fig. 8 illustrates, in accordance with one aspect of the present invention, the steps taken by the coherence 
transformer in response to a memory access request on the common bus., 

20 Fig. 9 illustrates, in accordance with one aspect of the present invention, the steps taken by the coherence trans- 

former in response to a memory access request issued by one of the external devices. 

Fig. 1 0 illustrates, in accordance with one aspect of the present invention, the steps taken by the coherence trans- 
former in the snoop-only mode in response to a memory access request on the common bus. 

25 

Fig. 1 1 illustrates, in accordance with one aspect of the present invention, the steps taken by the coherence trans- 
former in the snoop-only mode in response to a memory access request issued by one of the external devices. 

Figs. 12 and 13 illustrate, in accordance with one aspect of the present invention, the various requests and their 
30 possible responses in the Mtag-only mode. 

Fig. 1 4 illustrates, in one embodiment of the present invention, selected transactions performed by the coherence 
transformer in the Mtag-only mode in response to remote memory access requests on the common bus. 

35 Rg. 15 illustrates selected transactions performed by the coherence transformer in the Mtag-only mode in response 

to memory access requests from one of the external devices. 

[0022] An invention is described for permitting memory blocks having a local physical address (LPA) in a particular 
computer node to be shared, in an efficient and error-free manner, among interconnected entities such as internal 
40 processing nodes and external devices. In the following description, numerous specific details are set forth in order to 
provide a thorough understanding of the present invention. It will be obvious, however, to one skilled in the art, that 
the present invention may be practiced without some or all of these specific details. In other instances, well known 
structures and process steps have not been described in detail in order not to unnecessarily obscure the present 
invention. 

45 [0023] In accordance with one embodiment of the present invention, there is provided a coherence transformer for 
coupling a computer node, e.g., computer node 100, to a plurality of external devices. The coherence transformer 
permits an external device, which may employ a different protocol from that employed by computer node 1 00 and may 
even operate at a different speed, to access memory blocks having local physical addresses within computer node 
100. In one embodiment of the present invention, the coherence transformer monitors for selected memory access 

so requests on the bus of computer node 100. If one of the selected memory access requests on the bus of computer 
node 1 00 pertains to a memory block currently cached by an external device, the coherence transformer may provide 
the latest copy of that memory block to the requesting entity, thereby avoiding a coherence problem. Further, the 
coherence transformer also permits the external devices to coherently obtain copies of memory blocks having local 
physical addresses within computer node 1 00. 

55 [0024] The operational details of the coherence transformer may be better u nderstood with reference to the drawings 
that follow. Referring now to Fig. 2, there is provided, in accordance with one embodiment of the present invention, a 
coherence transformer 200 for coupling computer node 1 00 to one of a plurality of external devices 202, 204, and 206. 
Note that although only one of each type of external device (202, 204, or 206) is shown for ease of illustration, there 
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may in fact exist many external devices of each type coupled to coherence transformer 200. Via coherence transformer 
200, the contents of the memory blocks of memory module 110, e.g., memory blocks 112, may be accessed by any of 
external devices 202, 204, and 206. In accordance with one aspect of the present invention, memory blocks of memory 
module 110 may be shared by the external devices although these external devices employ protocols and operate at 

5 speeds different from those on common bus 1 08 of computer node 1 00. 

[0025] External device 202 may represent, for example, an I/O device such as a gateway to a computer network that 
may obtain a few memory blocks 112 at a time from memory module 110 via coherence transformer 200. External 
device 204 may represent, for example, a coherence domain such as a processor, whose internal protocol and operating 
speed may differ from that running on common bus 108. Examples of differences include differences in block sizes 

10 and signaling. External device 206 may represent, for example, a distributed shared memory agent device. 

[0026] Distributed shared memory agent device 206 may include logic circuitry for connecting computer node 100 
to other distributed shared memory (DSM) domains such as other computer nodes to facilitate the sharing of memory 
blocks among different DSM domains and with computer node 1 00. Further, distributed shared memory agent device 
206 may permit a processing node 1 02 in computer node 1 00 to access both memory block 1 1 2 within its local memory 

15 module 110 as well as well memory blocks associated with memory modules within computer nodes 150, 160, and 
170, and vice versa. The use of distributed shared memory agent 206 creates the illusion that there is a centralized 
shared memory resource that the processors within computer nodes 1 00, 1 50, 1 60, and 1 70 may access although this 
centralized memory resource is physically implemented and distributed among different computer nodes. 
[0027] Coherence transformer 200 may communicate with common bus 1 08 of computer node 1 00 via a coherence 

20 transformer link 220. On the external domain, coherence transformer 200 may communicate with any of the external 
devices e.g., any of external devices 202, 204, and 206, via links 222, 224, and 226 using a protocol that is appropriate 
for the external device with which it communicates. 

[0028] Referring now to Fig. 3, there are shown in memory module 1 1 0, in accordance with one aspect of the present 
invention, a plurality of memory tags (Mtags) 252. Each of Mtag 252 is logically associated with a memory block within 
25 memory module 110. In one embodiment, Mtags 252 are implemented in the same memory space, e.g., dynamic 
random access memory (DRAM) , as the memory blocks with which they are associated and may be physically adjacent 
to its respective memory block 112. In another embodiment, Mtags 252 are logically associated with its respective 
memory blocks 112, albeit being implemented in a different memory space. 

[0029] A Mtag 252 tracks the global state of its respective memory block, i.e., whether computer node 100 has 
30 exclusive, shared, or invalid access to a memory block (irrespective of which processing node has that memory block). 
Fig. 4 shows, in one embodiment of the present invention, the various available states that may be stored in a Mtag 
252. In Fig. 4, three possible states are shown: gl, gS, orgM, signifying respectively that an invalid, shared, or exclusive 
copy of a memory block is being held by internal entities, i.e., entities within computer node 100. Note that for the 
purposes of the present invention, the state of a Mtag 252 is determined by whether its associated memory block is 
35 referenced by internal entities (e.g., by memory module 110 or any of processors 102, 104, and 106) or by devices in 
the external domain (i.e., external to computer node 100 such as any of external devices 202, 204, and 206). Further, 
the state of each Mtag is generally independent of which specific device within these domains currently has the memory 
block. Consequently, a Mtag can generally indicate whether an external device has a valid copy of a memory block. 
The state of Mtag generally cannot indicate which device, either internally or externally, currently has the latest valid 
40 copy. 

[0030] If the state of Mtag 252 is gM, the internal domain has a valid, exclusive (and potentially modified from the 
copy in memory module 110) copy of the associated memory block. Further, there can be no valid (whether exclusive 
or shared) copy of the same memory block in the external domain since there can be no other valid copy of the same 
memory block existing anywhere when an exclusive copy is cached by a given device. If the state of Mtag 252 is gS, 

45 the internal domain has a valid, shared copy of the associated memory block. Further, since many shared copies of 
the same memory block can exist concurrently in a computer system, the external domain may have other shared 
copies of the same memory block as well. If the state of Mtag 252 is gl, the internal domain does not have a valid copy 
of the associated memory block. Since neither memory module 1 1 0 nor any bus entities 1 02, 1 04, and 1 06 has a valid 
copy, the valid copy may reside in the external domain. In one embodiment, when the state of Mtag 252 is gl, it is 

so understood that the external domain has an exclusive (and potentially modified) copy of the associated memory block. 
[0031] Fig. 5 shows in greater detail in accordance with one aspect of the present invention the format a memory 
access request 400, representing a typical memory access request on common bus 108. The memory access request 
may be output by, for example, one of the processing nodes 102, 104, or 106 or by coherence transformer 200 on 
behalf of one of the external devices 202, 204, or 206. 

55 [0032] Memory access request 400 typically includes a type field 402, an address field 404, a source ID field (SID) 
406, and an own flag 408. Type field 402 specifies the type of memory access req uest being issued. As will be discussed 
in detail in connection with Fig. 8 herein, memory access request types specified in field 402 may include, among 
others, a request to own (RTO), remote request to own (RRTO), request to share (RTS), remote request to share 
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(RRTS), and write back (WB). Address field 404 specifies the address of the memory block being requested by the 
progenitor of memory access request 400. Source ID field 406 specifies the identity of the progenitor of memory access 
request 400, i.e., the entity that issues memory access request 400. 

[0033] Own flag 408 represent the flag bit that is normally reset until one of the entities other than memory 1 1 0 that 
5 is capable of servicing the outstanding memory access request, e.g., one of processing nodes 100-106, sets own flag 
408. An entity coupled to common bus 1 08 may wish to set own flag 408 to indicate that the current memory access 
request should not be serviced by memory module 110, i.e., one of the entities capable of caching that memory block 
had done so and may now potentially have a newer copy than the copy in memory module 110. 
[0034] Fig. 6 shows in greater detail in accordance with one embodiment of the present invention the format of a 
10 response 500. Response 500 is typically issued by the entity responding to an earlier issued memory access request, 
e.g., one having the format of memory access request 400 of Fig. 5. As is shown in Fig. 6, response 500 includes a 
source ID (SID) field 502, representing the unique ID of the requesting entity to which the response should be sent. In 
one embodiment, the content of SID field 502 is substantially similar to the SID data contained in source ID field 406 
of Fig. 4. The use of the source ID permits coherence transformer 200 to communicate directly with common bus 108 
15 and entitles coherence transformer 200 to rely on the mechanism of common bus 1 08 to forward the response, using 
the SID, to the appropriate final destination. Response 500 further includes a data field 504, representing the content 
of the relevant memory block. 

[0035] Fig. 7A shows, in one embodiment, the functional units with in coherence transformer 200. In one embodiment, 
the functional units are implemented as digital logic circuits. As can be appreciated by those skilled in the art, however, 

20 these functional units may be implemented either in hardware (digital or analog) or in software, depending on needs. 
Within coherence transformer 200, there is shown in coherence transformer 200 a tag array 250, representing the 
mechanism for keeping track of the memory blocks accessed by a device on the external side, e.g., one of external 
devices 202, 204, and 206. Within tag array 250, there is shown a plurality of tags 273, 274, 276, 278. In one embod- 
iment, there may be provided as many tags in tag array 250 as reasonably possible. As will be discussed in greater 

25 detail later, the provision of a large number of tags in snoop tag array 250 advantageously minimizes any impact on 
the bandwidth of common bus 108 when a large number of memory blocks are cached by the external domain. Of 
course, the number of tags in tag array 250 may vary depending on needs and may represent any arbitrary number. 
[0036] In accordance with one embodiment of the invention, externally cached memory blocks are tracked, for the 
duration that they are externally cached, in tags within snoop tag array 250 whenever possible. When tags run out, i. 

30 e., when there are more memory blocks currently cached externally than there are available tags within snoop tag 
array 250, the embodiment advantageously permits the extra memory blocks to be externally cached without tracking 
them in snoop tag array 250 for the entire duration that they are externally cached. 

[0037] As will be described in detail herein, buffer 280 represents a tag especially dedicated for temporarily storing 
memory blocks that are going to be cached externally without being tracked in snoop tag array 250 (referred herein 

35 as the Mtag-only approach). In other words, the purpose of buffer 280 is to temporarily track memory blocks cached 
externally in accordance with the Mtag-only approach and while in transit. Once that memory block is properly cached 
externally using the Mtag-only approach, e.g., by writing back to memory module 110 the proper Mtag state, buffer 
280 may be recycled to temporarily track another memory block externally cached using the Mtag-only approach and 
while in transit. In one embodiment, multiple buffers 280 may be provided to track memory blocks in transit and cached 

40 externally using the Mtag-only approach. 

[0038] Note that buffer 280 does not track an externally cached memory block for the entire duration that that memory 
block is cached externally. In other words, buffer 280 may be recycled for reuse in temporarily storing the identity of 
another externally tracked memory block that is in transit and externally cached using the Mtag-only approach even if 
the last memory block it temporarily stored is still being cached externally. This is different from the function provided 

*5 by tags of snoop tag array 250, e.g., tags 273, 274, 276, and 278, which track externally cached memory blocks for 
the entire duration that they are cached externally, and are only recycled for reuse when the memory blocks they track 
are no longer externally cached. Because of its temporary storage purpose, buffer 280 may not be counted, in one 
embodiment, as part of the tags available for tracking externally cached blocks since buffer 280 may be used for 
temporary storage only. The operation of buffer 280 will be described in detail herein. 

so [0039] There is coupled to tag array 250 a snooping logic 260, representing the circuitry employed to monitor memory 
access requests on common bus 108 on Fig. 1 . In one embodiment, snooping logic 260 is substantially similar to the 
conventional snooping logic employed in each of processing nodes 102, 104, and 106 for permitting those processing 
nodes to monitor memory access requests on common bus 108. 

[0040] Within each tag, e.g., tag 273, there is a state field 272(a), an address field 272(b), and an optional valid flag 
ss 272(c). Optional valid flag 272(c) indicates whether a tag is allocated or is empty. In one embodiment, state field 272 
(a) may store one of the three states (eM, eS, and el) although additional states may be employed if desired. Fig. 7B 
shows, in one embodiment of the present invention, the various available states that may be stored in state field 272 
(a) of the tags of tag array 250. In Fig. 7B, three possible external states are shown: el, eS, and eM, signifying respec- 
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tively that an invalid, shared, and exclusive copy of a memory block is being cached by an external device. Note that 
these external states are different from the global states in that the external states reflect the states of the memory 
block from the perspective of the external domain. On the other hand, the global states (reflected in the Mtags) reflect 
the states of the memory blocks from the perspective of the internal domain. Address field 272(b) stores the address 
5 of the memory block cached, thereby permitting coherence transformer 200 to track the memory blocks that both are 
currently cached by an external device and tracked within snoop tag array 250. 

[0041] It should be apparent to those skilled in the art from the foregoing that some type of protocol conversion may 
be necessary to permit devices and systems utilizing different protocols and/or operating at different speeds to share 
memory blocks. Protocol transformer logic 262 represents the circuitry that permits coherence transformer 200 to 

10 communicate with an external device, e.g., one of external devices 202, 204, and 206. Protocol transformer logic 262 
may be omitted, for example, if the external device employs the same protocol as that employed in computer system 
1 00 or operates at the same speed. Keep in mind that the specific protocol employed to communicate with a specific 
external device may vary greatly depending on the specification of the protocol employed by that external device. As 
will be discussed in greater detail herein, it is assumed that communication for the purpose of sharing memory blocks 

15 with the external devices can be accomplished using a generalized protocol known as the X-protocol. The adaptation 
of the described X-protocol to a specific external device should be readily apparent to those skilled in the art given this 
disclosure. 

[0042] In one embodiment of the present invention, a coherence transformer tracks as many of the memory blocks 
cached by the external device as it can in the tags of snoop tag array 250. As will be discussed in detail later herein, 

20 f whenever coherence transformer 200 can track an externally cached memory block in a tag within snoop tag array 
250 for the entire duration of that the block is externally cached, i.e., there is room in snoop tag array 250 for such 
tracking, there is no need to change the Mtag state of that memory block within memory module 110. Advantageously, 
when the system operates in the snoop-only approach, adverse impact on common bus 108 is minimized since there 
is no need to take up the bandwidth of common bus 108 to perform a write to memory module 110 to change the Mtag 

25 of externally cached memory blocks. 

[0043] On the other hand, when there is no more room in snoop tag array 250 for such tracking, the embodiment 
still allows a memory block within memory module 110 to be externally cached. Instead of tracking this externally 
cached memory block in snoop tag array 250 for the entire duration of that the block is externally cached, however, 
coherence transformer 200 merely temporarily track this memory block in a buffer that is especially reserved for this 

30 purpose, e.g., buffer 280, and writes the new Mtag back into memory 110 at the earliest opportunity. Once the new 
Mtag is written into memory 110, there is advantageously no need to continue to track this memory block, and buffer 
280 can be made available again (via a flag, for example) to temporarily track another externally cached memory block. 
[0044] The coherence transformer then monitors memory access requests on the bus of computer system 100. If 
one of the memory access requests on the bus of computer system 1 00 pertains to a memory block currently cached 

35 by an external device and tracked in snoop tag array 250(whether in tags 273-278 or temporarily in buffer 280), the 
coherence transformer enters the snoop-only mode. As the term is used herein, the snoop-only mode pertains to the 
mode wherein coherence transformer 200, having found a match between the requested memory block and one of 
the memory blocks tracked in snoop tag array 250, intervenes to provide the latest copy of that memory block, instead 
of allowing memory module 110 to respond to the outstanding memory access request. In this manner, coherence 

40 problems are advantageously avoided. 

[0045] If the outstanding memory access does not pertain to a memory block currently tracked in a tag of snoop tag 
array 250 (whether in tags 273-278 or temporarily in buffer 280), the requested memory block is either not cached by 
an external device, or is currently cached by an external device but the Mtag state in memory module 110 has been 
modified, coherence transformer 200 does nothing forthe moment. This is because there would be no harm in allowing 

45 jn allowing memory module 110 to respond and to subsequently handle this memory access request using the Mtag 
only approach. 

[0046] Fig. 8 illustrates, in accordance with one embodiment of the present invention, the steps involved for coher- 
ence transformer 200 to respond to a memory access request outstanding on common bus 108, i.e., one originated 
in the internal domain. In step 502, a memory access request appears on common bus 108 and seen by coherence 
50 transformer 200, which is coupled thereto. In step 504, coherence transformer ascertains whether the requested mem- 
ory block, i.e., the one requested in the internally originated memory access request on common bus 108, matches 
one of the memory blocks tracked by the tags in snoop tag array 250. 

[0047] If there is a match, the method proceeds to step 506 wherein outstanding memory access request on common 
bus 108 is handled in accordance with the snoop-only approach (described in detail in section A herein). On the other 
55 hand, when there is not a match, the method proceeds to step 508 wherein outstanding memory access request on 
common bus 108 is handled in accordance with the Mtag-only approach (described in detail in section B herein). 
[0048] Fig. 9 illustrates, in accordance with one embodiment of the present invention, the steps involved for coher- 
ence transformer 200 to handle a memory access request originated in the external domain, i.e., issued by one of the 
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external devices. In step 604, coherence transformer 200 receives from the external device a command to cache a 
specific memory block within memory module 110. In step 606, coherence transformer ascertains whether there is 
room in its snoop tag array 250 to track this memory block for the entire duration of its being externally cached. In one 
embodiment, the check in step 606 involves determining whether there is one unused tag, beside the buffer(s) set 
5 aside for temporary storage of memory blocks in transition when all other tags are full, to track the externally requested 
memory block. 

[0049] If there is room in snoop tag array 250 to track this memory block for the entire duration of its being externally 
cached, the method proceeds to step 608 wherein the external memory access request is handled in accordance to 
the snoop-only approach (described in detail in section A herein). 
io [0050] On the other hand, if there is no room to track this externally requested memory block for the entire duration 
of its being externally cached, the method proceeds to step 61 0 wherein the external memory access request is handled 
in accordance to the Mtag-only approach (described in detail in section B herein). 

[0051] To increase the likelihood that a new externally originated request can advantageously employ the snoop- 
only approach (thereby saving the bandwidth of common bus 108), the coherence transformer may opt to select to 

15 replace a snoop tag (e.g., 273 in Fig. 7A). In other words, the coherence transformer may opt to unallocate a snoop 
tag although the memory block it tracks is still cached by an external device. To do this, the coherence transformer 
writes the block's Mtag into memory module 110 as a function of the current external state. For example, if the external 
state is eM, the Mtag should be set to gl, eS sets Mtag to gS, and el sets Mtag to gM. The algorithm employed to select 
which tag to replace may follow any conventional cache replacement algorithm, e.g., least recently used (LRU), random, 

20 first-in-first-out (FIFO), or the like. 

[0052] The unallocating of tags may, in some cases, be particularly advantageous, especially if the newly requested 
block is more active than the old one. Thus, the bandwidth on common bus 1 08 saved on the new requests may exceed 
the bandwidth consumed in writing the old block's Mtag back to memory module 110. 

25 Section A: Snoop Only approach 

[0053] In the snoop only approach, each externally cached memory block is tracked in a tag in snoop tag array 250. 
The externally cached memory block, once tracked, will continue to be tracked until the cached memory block is written 
back into memory module 110, thereby freeing up the tag to track another externally cached memory block. 

30 [0054] When there is a memory access request, e.g., one having a format of memory access request 400 of Fig. 4, 
on common bus 108, coherence transformer 200 (via coherence transformer link 220) monitors this memory access 
request and checks address field 404 of the memory access request against the addresses of the memory blocks 
cached by one of the external devices. With reference to Fig. 1 0A, this checking is performed, in one embodiment, by 
comparing address field 404 against the addresses stored in address fields 272(b) of the tags within the tag array 250. 

35 [0055] If there is an address match, the state of the matched tag is then checked to ascertain whether the memory 
block cached by the external device is of the appropriate type to service the outstanding memory access request. This 
is because an external device may currently have only an invalid copy of a memory block and would therefore be 
incapable of servicing either a RTO or a RTS memory access request. 

[0056] If the state of the matched tag indicates that the externally cached memory block is the appropriate copy for 
40 servicing the outstanding memory access request, snooping logic 260 of coherence transformer 200 may then set own 

flag 408 to signify that the default response should be overridden, i.e., memory module 110 should not respond to the 

outstanding memory access request since there may be a more current copy cached by one of the external devices. 

[0057] Own flag 408 of memory access request 400, while logically associated with the memory access request, is 

skewed in time therefrom in one embodiment. In this manner, the own flag may arrive a few cycles later than the rest 
45 of the memory access request to allow time for entities, such as coherence transformer 200, to ascertain whether they 

should respond to the memory access request with a more recent copy of the requested memory block than that 

available in memory module 110. 

[0058] Coherence transformer 200 then obtains the appropriate copy of the requested memory block from the ex- 
ternal device, using its knowledge of which external device currently holds the most recent copy Coherence transformer 

so 200 then formulates a response 500 to return the appropriate copy of the requested memory block to common bus 
1 08 to be forwarded to the requesting entity, i.e., one identified by the source ID in the issued memory access request. 
[0059] As mentioned earlier, whenever memory block 1 1 2 is cached by one of the external devices, a tag is employed 
in tag array 250 of coherence transformer 200 to track the fact that this memory block is being externally cached, and 
also the external state of that memory block. In this manner, coherence transformer 200 can keep track of which 

55 memory block of computer system 1 00 has been cached by the external devices and (in state field 272(a) of the 
matching tag) which type of copy was actually cached. 

[0060] When the state in the tag is el, either the external devices do not have a copy of the requested memory block 
(even if there is a match between the incoming address and one of the addresses stored in tag array 250) or one of 
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the external devices does have a copy but this memory block is not tracked by snoop tag array 250 since its corre- 
sponding Mtag state has already been properly reflected in memory module 110. In this case, the invention advanta- 
geously treats the requested memory block as if it is not cached by the external domain, and simply ignore the present 
memory access request. 

5 [0061] If the state in the matching tag is eS, at least one of the external devices has owned a shared, read-only copy 
of the requested memory block. If the state in the matching tag is eM, one of the external devices owns an exclusive 
copy of the requested memory block, which it can use to respond to, for example, a RTO memory access request. 
Further, the external device owning the exclusive copy can unilaterally modify this copy without having to inform other 
bus entities attached to common bus 1 08. 

10 [0062] The operation of coherence transformer 200 may be more clearly understood with reference to Figs. 1 0 and 
11. Fig. 10 illustrates, in one embodiment of the present invention, selected transactions performed by coherence 
transformer 200 in response to memory access requests on common bus 1 08. 

A1 . RTO Request on Bus 

15 

[0063] Referring now to Fig. 1 0, when a RTO memory access request is issued by one of the bus entities on common 
bus 1 08 (as the term is used hereinafter, a "bus entity" refers to any entity such as a processing unit or any other device 
that is coupled to common bus 108 for sharing a memory block), this RTO memory access request is forwarded to all 
bus entities, including coherence transformer 200. Coherence transformer 200 then ascertains whether the address 

20 of the requested memory block matches one of the addresses stored in tag array 250 of coherence transformer 200. 
[0064] If there is an address match, the current state of the matching tag is then ascertained to determine whether 
the copy cached by one of the external devices is of the appropriate type for responding to the memory access request 
on common bus 108. If the memory access request is a request for an exclusive copy of a memory block (a RTO) or 
a request for a shared copy of a memory block (a RTS), and the current state of the matching tag is el (invalid), 

25 coherence transformer 200 ignores the present RTO memory access request since the external device either never 
cached the requested memory block or one of the external devices does have a copy but this memory block is not 
tracked by snoop tag array 250 since its corresponding Mtag state has already been properly reflected in memory 
module 110. 

[0065] If the memory access request on common bus 1 08 is a RTO (the first RTO of this transaction) and the current 
30 tag is eS, coherence transformer 200 needs to invalidate any shared external copy or copies currently cached by one 
or more of the external devices. This invalidation is illustrated in Fig. 10 by the XINV command, which is a X-protocol 
invalid command directed at every external device currently having a shared external copy. Following the invalidation, 
the new state of the memory block in the external device is invalid (New State = el). 

[0066] Upon confirmation that the external device has invalidated its shared copy of the requested memory block 
35 (via the X-protocol command XINV_ack), coherence transformer 200 then downgrades the state of the matching tag 
to invalid (New State = el) to reflect the fact that there is no longer a valid external copy. Coherence transformer 200 
then obtains a copy of this requested memory block from computer system 100 and invalidates all internal copies 
cached by bus entities within computer system 1 00. Both these actions are accomplished when coherence transformer 
200 issues a RTO command (the second RTO of this transaction) to common bus 108 and receives the requested 
40 data (via the RTO_data response to the second RTO). The copy of the requested memory block is then sent to common 
bus 1 08 to be forwarded to the entity that originally issues the RTO memory access request (via the RTO_data response 
to the first RTO). In one embodiment, the coherence transformer passes Mtag gM with the final RTO data. Alternatively, 
the coherence transformer may first update the Mtag in memory module 110 and then respond to the RTO with data. 
[0067] Note that the use of the XINV command advantageously invalidates all shared copies of the requested memory 
45 block cached by the external device(s). Further, the use of the RTO request by coherence transformer 200 to common 
bus 108 advantageously ensures that all shared copies within computer system 100 are invalidated and obtains the 
required memory block copy to forward to the requesting entity. 

[0068] The current state of the matching tag may be an eM when a RTO memory access request appears on common 
bus 108. The eM state signifies that an external device currently caches an exclusive (and potentially modified) copy 
50 of the memory block being requested. In this case, coherence transformer 200 may obtain the exclusive (and potentially 
modified) copy of the requested memory block from the external device and return that copy to the entity that originally 
issues the RTO request on common bus 108. In one embodiment, the coherence transformer passes Mtag gM with 
the final RTO data. Alternatively, the coherence transformer may first update the Mtag in memory module 1 1 0 and then 
respond to the RTO with data. 

55 [0069] As shown in Fig. 1 0, coherence transformer 200 may issue a RTO-like transaction using the X-protocol XRTO 
transaction to request the exclusive copy of the memory block, which is currently being cached by one of the external 
devices. If there are multiple external devices coupled to coherence transformer 200, there may be provided with 
coherence transformer 200 conventional logic, in one embodiment, to allow coherence transformer 200 to determine 
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which external device currently holds the desired exclusive copy of the requested memory block. 
[0070] The requested copy of the memory block is then returned to coherence transformer 200 from the external 
device that currently holds it (using the XRTO_data command, which is analogous to the aforementioned RTO_data 
except accomplished using the X-protocol). Further, the external copy that was previous cached by the external device 

5 is downgraded to an invalid copy. This downgrade is tracked in the matching tag in tag array 250, thereby changing 
the state to el (New State = el). After coherence transformer 200 receives the exclusive copy of the requested memory 
block from the external device that previously cached it, coherence transformer 200 formulates a response to the 
original RTO, using e.g., using an RTO_data response in a format similar to that shown in Fig. 5, to furnish the requested 
exclusive copy of the memory block to common bus 108 to be forwarded to the entity that originally issued the RTO 

10 memory access request. In one embodiment, the coherence transformer passes Mtag gM with the final RTO data. 
Alternatively, the coherence transformer may first update the Mtag in memory module 110 and then respond to the 
RTO with data. 

A2. RTS Request on Bus 

15 

[0071] If the memory access request on common bus 108 represents a request for a shared, read-only copy of a 
memory block, i.e., a RTS (the first RTS) and the current state of the matching tag is el (invalid), coherence transformer 
200 will ignore the outstanding RTS memory access request even if there is a match between the incoming address 
and one of the addresses stored in the tags of tag array 250. 

20 [0072] On the other hand, if the current state of the matching tag is eS (i.e., one or more of the external devices 
currently cache shared, read-only copies of the requested memory block), coherence transformer 200 may, in one 
embodiment, obtain the shared, read-only copy of the requested memory block from computer system 100 itself, e.g., 
by issuing a RTS request to common bus 108 (the second RTS request). After coherence transformer 100 receives 
the shared, read-only copy from computer system 100 (via the RTS_data response to the second RTS), it then forwards 

25 the copy to common bus 108 to be forwarded to the bus entity that originally issues the RTS. command (via the 
RTS_data response to the first RTS). 

[0073] If the memory access request on common bus 108 is a RTS and the current state of the matching tag is eM, 
coherence transformer 200 may obtain the copy of the memory block that is currently exclusively owned by one of the 
external devices. Further, coherence transformer 200 may downgrade that external copy to a shared copy, and return 

30 the data to common bus 1 08 to be forwarded to the entity that originally issued the RTS memory access request. To 
accomplish the foregoing, coherence transformer 200 may issue a X-protocol RTS-like transaction (XRTS) to the ex- 
ternal device that currently exclusively owns the requested memory block. That external device will return the copy it 
previously owns as an exclusive copy to coherence transformer 200 (XRTS_data) and also downgrade the external 
copy from an exclusive copy to a shared .copy (New State = eS in the matching tag). When coherence transformer 

35 200 receives the copy of the memory block from the external device, it can forward that copy to common bus 1 08 (via 
the RTS_data command) to be forwarded to the entity that originally issue the RTS memory access request. The 
coherence transformer may then write the data and Mtag gS into memory module 110 with the WB transaction. 

A3. WB Request on Bus 

40 

[0074] If the memory access request on common bus 1 08 represents a write back (WB) request, i.e., signifying that 
a bus entity coupled to common bus 108, other than coherence transformer 200, wishes to write back the exclusive 
copy of the memory block it currently owns. In this situation, the response of coherence transformer 200 depends on 
the state of the copy of the memory block currently cached by the external device. Generally, the entity that issues the 
45 write back memory access request owns the exclusive copy of that memory block, and any copy that may have been 
cached by an external device earlier must be invalid by the time the write back memory access request is asserted by 
its owner on common bus 108. Consequently, the current state in the matching tag, if any, should be el (invalid), in 
which case coherence transformer 200 does nothing and ignores the outstanding write back transaction on common 
bus 108. 

so [0075] If, for some reason, the current state of a matching tag is eS or eM, an error condition would be declared 
since there cannot be another shared or exclusive copy in the network if the write back entity already has an exclusive 
copy of the memory block. The resolution of this error condition is conventional and may include, for example, flagging 
the error and performing a software and/or a hardware reset of the system. 

[0076] Coherence transformer 200 not only interacts with the processing nodes within computer system 100 to re- 
55 spond to memory access requests issued by those processing nodes, it also interacts with the external devices, e.g., 
one of external devices 202, 204, and 206, in order to service memory access requests pertaining to memory blocks 
having local physical addresses within computer system 100. Fig. 11 illustrates, in accordance with one embodiment, 
selected transactions performed by coherence transformer 200 in response to memory access requests from one of 
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the external devices. 

[0077] In Fig. 11 , the memory access requests are issued by one of the external devices, e.g., one of devices 202, 
204, or 206, to coherence transformer 200. If another external device currently caches the required copy of the re- 
quested memory block, this memory access request may be handled by logic circuitry provided with coherence trans- 

5 former 200 without requiring the attention of common bus 108. 

[0078] On the other hand, if another external device does not have the valid copy of the requested memory block to 
service the external memory access request, coherence transformer 200 then causes a memory access request to 
appear on common bus 1 08, using a protocol appropriate to computer system 1 00, so that coherence transformer 200 
can obtain the required copy of the requested memory block on behalf of the requesting external device. With reference 

10 to Fig. 9, the snoop-only approach to XRTO, XRTS, and XWB of this section A assumes that there is room in snoop 
tag array to track the memory block to be externally cached for the entire duration that this memory block is externally 
cached (step 608 of Fig. 9). If there is not enough room, the embodiment preferably employs the Mtag-only approach 
to handle the memory requests externally originated (step 610 of Fig. 9). 

[0079] In the remaining discussion of section A, since a copy of the memory block is now cached by an external 
15 device and serviced in accordance with the snoop-only approach, this memory block is tracked in a tag in tag array 
250 of coherence transformer 200. 

[0080] In one embodiment, the coherence transformer always asserts the own flag on bus transactions for blocks 
that are externally cached as exclusive or may be externally cached as shared. This advantageously allows the co- 
herence transformer to take more time to correctly handle such requests, 

20 

A4. XRTO Request 

[0081] Referring now to Fig. 11 , when an external device issues a memory access request to obtain an exclusive 
copy of a memory block having a local physical address within computer system 100, e.g., memory block 112(a), 
25 coherence transformer 200 first determines whether the address of the requested memory block matches one of the 
addresses stored in tag array 250 of coherence transformer 200. If there is a match, the current state of the tag that 
matches the incoming address, i.e., the matching tag, is then ascertained to determine whether an external device, e. 
g., any of the external devices that couple to coherence transformer 200, has cached a copy of the requested memory 
block. 

30 [0082] If the current state of the matching tag is el (invalid), coherence transformer 200 proceeds to obtain the 
requested memory block from common bus 108. This is because a current state invalid (el) indicates that either none 
of the external devices currently caches a valid (whether a shared, read-only copy or an exclusive copy) of the requested 
memory block or that an external device is currently caching a valid copy of the requested memory block but this fact 
is not tracked in snoop tag array 250 since the Mtag corresponding to the requested memory block has already been 

35 properly updated in memory module 11 0. In this case, the embodiment advantageously treats the memory block as if 
it is not cached by one of the external devices, thereby permitting coherence transformer to request this memory block 
from the internal domain. 

[0083] Further, since the requested memory block will be cached by the requesting external device, e.g., I/O device 
202, after the current memory access request is serviced (since it has already been ascertained in step 606 of Fig. 9 

40 that there is room in snoop tag array 250), this requested memory block needs to be tracked within tag array 250 of 
coherence transformer 200 so that the next memory access request pertaining to this memory block can be serviced 
by coherence transformer 200 on behalf of the external device, e.g., I/O device 202, which then has an exclusive (and 
potentially modified) copy. An unused tag in tag array 250, e.g., one of the tags has a current state invalid or is simply 
unused, may be employed for tracking the newly cached memory block along with the state of the copy (e.g., eM, eS, 

45 or el). Fig. 11 shows this situation wherein the old tag has state el. 

[0084] Referring back to the case in Fig. 11 where there exists a XRTO memory access request from an external 
device and the current state of the matching tag is el or there is no tag that matches, coherence transformer 200 acts 
as another bus entity coupled to common bus 108, i.e., it communicates with common bus 108 using a protocol ap- 
propriate to computer system 1 00 to issue a memory access request for an exclusive copy of the requested memory 

so block. In other words, coherence transformer simply issues a RTO memory access request to common bus 108. 

[0085] The presence of the request-to- own (RTO) memory access request on common bus 108 causes one of the 
bus entities, e.g., one of processing nodes 102, 104, and 106, or memory module 110, to respond with the latest copy 
of the requested memory block (RTO_data transaction in Fig. 11). After coherence transformer 200 receives the ex- 
clusive copy of the requested memory block from common bus 1 08, it then forwards this exclusive copy to the requesting 

55 external device using a protocol that is appropriate for communicating with the requesting external device (generalized 
as the X-protocol XRTCLdata command herein). Further, the new state of the tag that tracks this requested memory 
block is now upgraded to an eM state, signifying that an external device is currently caching an exclusive (and potentially 
modified) copy of this memory block. 
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[0086] If one of the external devices, e.g., I/O device 202, Issues a read-to-own memory access request (using the 
X-protocol XRTO) for a given memory block and a shared, read-only copy of that memory block has already been 
cached by a sister external device, e.g., coherent domain device 204, there would already be a tag in tag array 250 
for tracking this memory block. However, the state of such tag will reflect an eS copy since the sister external device 

5 only has a shared read-only copy. In this case, there is no need to allocate a new tag to track the requested memory 
block. Coherence transformer must still invalidate all other shared copies of this memory block in computer system 
100 and on the sister external devices, as well as upgrade the state of the matching tag to an eM state. 
[0087] To invalidate the shared copies at the sister external devices, coherence transformer 200 may issue an in- 
validate command (XINV) to those sister external devices and wait for the acknowledged message (XINV_ack). To 

10 invalidate shared, read-only copies of the requested memory block on the bus entities in computer system 100, co- 
herence transformer 200 issues a request-to-own (RTO) memory access request to common bus 108. This RTO com- 
mand both obtains a copy of the requested memory block (RTO_data transaction) and invalidates the shared, read- 
only copies cached by the bus entities in computer system 100. 

[0088] After coherence transformer 200 receives the copy of the requested memory block from common bus 108 
15 (via the RTO_data transaction), coherence transformer 200 may then forward this copy to the requesting external 
device to service the XRTO memory access request (XRTO_data transaction). Further, the state associated with the 
matching tag in tag array 250 may be upgraded from an eS (shared) state to an eM (exclusive) state. 
[0089] If the memory access request received by coherence transformer 200 is a request for an exclusive memory 
block (XRTO) from an external device and a sister external device is currently caching the exclusive copy of that 
20 memory block, logic circuitry provided with coherence transformer 200 preferably obtains the requested memory block 
from the sister external device to satisfy the XRTO request without requiring the attention of coherence transformer 
200 itself. As a general rule, if there are more than one external devices, they may, in one embodiment, resolve memory 
access requests by passing copies of memory blocks among themselves before asking for it from common bus 108 
(via coherence transformer 200). On the other hand, if the XRTO memory access request for a memory block comes 
25 from an external device that already is currently caching the exclusive copy of the same requested memory block, an 
error condition exists as shown in Fig. 11 . The error condition may be handle using a variety of conventional techniques, 
e.g. , flag the error and/or perform a software or hardware reset. Further, in one embodiment, the coherence transformer 
could handle XRTO's to externally cache blocks by forwarding requests to sibling devices. 

30 A5. XRTS Request 

[0090] When an external device issues a memory access request to obtain a shared, read-only copy of a memory 
block having a local physical address within computer system 100 such as memory block 112(a) (via an XRTS com- 
mand), coherence transformer 200 first determines whether the address of the requested memory block matches one 
35 of the addresses stored in tag array 250 of coherence transformer 200. If there is a match, the matching tag is then 
ascertained to determine whether an external device, e.g., any of the external devices that couple to coherence trans- 
former 200, has cached a copy of the requested memory block. 

[0091] If the current state of the matching tag is el (invalid), coherence transformer 200 proceeds to obtain the 
requested memory block from common bus 108. This is because a current state invalid (el) indicates that none of the 

to external devices currently caches a valid (whether a shared, read-only copy or an exclusive copy) of the requested 
memory block or that an external device is currently caching a valid copy of the requested memory block but this fact 
is not tracked in snoop tag array 250 since the Mtag corresponding to the requested memory block has already been 
properly updated in memory module 110. In this case, the embodiment advantageously treats the memory block as if 
it is not cached by one of the external devices, thereby permitting coherence transformer to request this memory block 

45 from the internal domain. 

[0092] Further, since the requested memory block will be cached by the requesting external device, e.g., I/O device 
202, after the current memory access request is serviced, this requested memory block needs to be tracked within tag 
array 250 of coherence transformer 200 (step 606 already ascertained that there is room for tracking the requested 
memory block in snoop tag array 250). An unused tag in tag array 250, e.g., one of the tag has a current state invalid 

so or is simply unused, may be employed for tracking the newly cached memory block, along with the state of the copy 
(i.e., eS). 

[0093] Referring back to the case in Fig. 11 where there exists a XRTS memory access request from an external 
device and the current state of the matching tag is el or there is no tag that matches, coherence transformer 200 may 
act simply as another bus entity coupled to common bus 108, i.e., it communicates with common bus 108 using a 
55 protocol appropriate to computer system 100 to issue a memory access request for a shared, read-only copy of the 
requested memory block. In other words, coherence transformer simply issues a RTS memory access request to 
common bus 108. 

[0094] The presence of the request-to-share (RTS) memory access request on common bus 1 08 causes one of the 
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bus entities, e.g., one of processing nodes 1 02, 1 04, and 1 06, or memory module 1 1 0, to respond with the shared copy 
of the requested memory block (RTS_data transaction in Fig. 11). After coherence transformer 200 receives the shared, 
read-only copy of the requested memory block from common bus 108, it then forwards this shared, read-only copy to 
the requesting external device using a protocol that is appropriate for communicating with the requesting external 
5 device (generalized as the X-protocol XRTS_data command herein). Further, the new state of the tag that tracks this 
requested memory block is now upgraded to an eS state, signifying that an external device is currently caching a 
shared, read-only copy of this memory block. 

[0095] If one of the external devices, e.g., I/O device 202, issues a read-to-share memory access request (using the 
X-protocol XRTS) for a given memory block and a shared, read-only copy of that memory block has already been 
10 cached by a sister external device, e.g., coherent domain device 204, there would already be a tag in tag array 250 
for tracking this memory block. Further, the state of such tag will reflect an eS copy. In this case, there is no need to 
allocate a new tag to track the requested memory block. 

[0096] In one embodiment, logic circuitry associated with coherence transformer 200 may obtain the shared, read- 
only copy of the requested memory block from the sister external device to satisfy the outstanding XRTO request In 

15 this embodiment, no action on common bus 1 08 is required. In another embodiment, coherence transformer 200 may 
obtain the requested shared, read-only copy of the requested memory block from the bus entities in computer system 
100 by issuing, as shown in Fig. 11, a request-to-share (RTS) memory access request to common bus 108. 
[0097] After coherence transformer 200 receives the shared, read-only copy of the requested memory block from 
common bus 108 (RTS_data transaction), coherence transformer 200 may then forward this copy to the requesting 

20 external device to service the XRTS memory access request (XRTS_data transaction). Further, the state associated 
with the matching tag in tag array 250 is maintained at an eS (shared) state. 

[0098] If the memory access request received by coherence transformer 200 is a request for a shared, read-only 
copy of a memory block (XRTS) from an external device and a sister external device is currently caching the exclusive 
copy of that memory block, logic circuitry provided with coherence transformer 200 preferably obtains the requested 
25 memory block from the sister external device (and downgrades the previously existing exclusive copy) to satisfy the 
XRTS request without requiring the attention of coherence transformer 200 itself. On the other hand, if the XRTS 
memory access request comes from an external device that already is currently caching the exclusive copy of the 
requested memory block, an error condition exists as shown in Fig. 11. The error condition may be handle using a 
variety of conventional techniques, e.g., flag the error and/or perform a software or hardware reset. 

30 

A6. XWB Request 

[0099] If the memory access request received by coherence transformer 200 is a write back transaction (X-protocol 
XWB transaction), i.e., an external device wishes to write back the exclusive copy of a memory block it currently owns, 

35 the actions of coherence transformer 200 depends on the state of the copy of the memory block currently cached by 
the external device. Generally, the external device that issues the write back transaction owns the exclusive copy of 
that memory block, and the current state of the matching tag in tag array 250 should show an eM (exclusive) state. 
Consequently, if the current state in the matching tag is el (invalid) or eS (shared, read-only), an error condition exists 
as shown in Fig. 11 . Again, this error condition may be handle using a variety of conventional techniques, including 

40 flagging the error and/or performing a software or hardware reset. 

[01 00] If the current state of the matching tag in tag array 250 is an eM (exclusive) state, coherence transformer 200 
proceeds to receive the data to be written back (via the X-protocol XWB_data command) and issues a WB memory 
access request to common bus 1 08, to be followed up by the data (WB_data) (and also sets the Mtag to gM). Further, 
the external copy of the requested memory block is downgraded accordingly from an exclusive copy to an invalid copy 

« (New State = el). 

[0101] To further clarify the details regarding the generalized X-protocol, which is employed by coherence protocol 
200 in communicating with each external device, Tables 1 and 2 illustrate selected X-protocol requests and X-protocol 
responses.. It should be borne in mind that Tables 1 and 2 are shown for illustration purposes only and other requests 
and responses may also be provided depending on needs. As mentioned earlier, the adaptation of the disclosed gen- 
50 eralized X-protocol transactions to work with a specific external coherence domain will depend greatly on the specifi- 
cation of the protocol employed by the specific external device and is generally within the skills of one skilled in the art. 
[0102] In Table 1 , the X-protocol requests, the actions represented by the requests, and possible responses thereto 
are shown. In Table 2, the X-protocol responses and the actions represented by the responses are shown. Table 2 
further specifies whether a given response will be accompanied by data. 

55 
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Table 1 



REQUEST 


ACTION 


POSSIBLE RESPONSES 


XRTO 


Get exclusive copy of memory block 


XRTO_data, XRTO_nack 


XRTS 


Get shared, read-only copy of memory block 


XRTS_data, XRTS_nack 


XINV 


Invalidate copy of memory block 


XINV_ack 


XWB 


Request to write back currently cached exclusive copy of memory block 


XWB_ack, XWB_nack 



Table 2 



RESPONSES 


ACTION 


DATA? 


XRTO_data 


Reply with exclusive copy of memory block 


Y 


XRTO_nack 


Not acknowledged, retry XRTO progenitor 


N 


XRTS_data 


Reply with shared copy of memory block 


Y 


XRTS_nack 


Not acknowledged, retry XRTS progenitor 


N 


XINV_ack 


acknowledged 


N 


XWB_ack 


acknowledged, permitting XWB_data 


N 


XWB_data 


write back with exclusive copy of memory block 


Y 



25 

[0103] Advantageously, the use of a coherence transformer and the tightly-coupled request-response transactions 
permit external devices, which may employ protocols different from the protocol on common bus 108 of computer 
system 100 to share memory blocks which have local physical addresses within computer system 100. Further, the 
explicit handshaking provided by the tightly coupled request- response pairs makes this sharing possible even if the 

30 external devices may each be operating at a different operating speed from that on common bus 108. 

[0104] In a system in which coherence transformer 200 facilitates such memory block sharing, there is essentially 
no effect on system performance within computer system 100 when an external device does not cache a memory 
block. When an external device caches fewer memory blocks then there are tags in tag array 250 of coherence trans- 
former 200, the effect on the overall system performance is fairly minimal. This is because when there are fewer 

35 externally-cached memory blocks than there are available tags in tag array 250, no additional transactions employing 
common bus 1 08, i.e., those associated with the Mtag-only approach to allow external devices to cache memory blocks 
without tracking them in snoop tag array 250. 

[0105] The latency in responding to outstanding memory access requests on common bus 108 is due in part from 
the delay required for coherence transformer 200 to examine tags in tag array 250 to determine whether coherence 
40 transformer 200 should intervene to service a memory access request on common bus 1 08. 



Section B: Mtag-only approach: 

[0106] In the Mtag-only approach, i.e., the approach taken by steps 508 of Fig. 8 and 610 of Fig. 9), an externally 
45 originated memory access request can be serviced even though there is no room in snoop tag array 250 for tracking 
the externally cached memory block for the duration it is externally cached. 

[0107] Figs. 12 and 13 show, in one embodiment of the Mtag-only approach, the memory access requests and 
responses issued by a bus entity, e.g., any of the entities coupled to common bus 108 such as processing units 1 02, 
104, 106 or coherence transformer 200. In the description that follows, it is assumed for simplicity of illustration that 
50 there is only one bus entity internal to computer node 100, e.g., processing unit 102, being coupled to common bus 
108. If there are more than one internal bus entities coupled to common bus 108, e.g., both processing units 102 and 
1 04 are present on common bus 1 08, the resolution of coherence problems among these internal bus entities may be 
resolved using any conventional method. 

[0108] By way of example, one solution to such coherence problems involves requiring each internal bus entity to 
55 snoop bus 1 08. If the snooped memory access request involves a memory block whose latest copy is cached by that 
internal bus entity, that internal bus entity may intervene to respond to the outstanding memory access request before 
memory module 110 may respond. An internal bus entity may ignore an outstanding memory access request if the 
request does not involve a memory block cached by that internal bus entity. If no internal bus entity intervenes, memory 
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module 11 0 is implicitly responsible for responding with the copy it currently possesses. 

[0109] Referring now to Figs 12 and 1 3, a bus entity, e.g., processing node 1 02, may issue a memory access request 
for an exclusive copy of memory block 1 1 2(a) by issuing a request to own (RTO) request. In the description that follows, 
a request may have the form of request 400 of Fig. 5. On the other hand, a response may have the form of response 
5 500 of Fig. 6. 

[01 1 0] If no other internal bus entities intervenes responsive to the RTO request, memory module 1 1 0 may respond 
to the outstanding RTO request with a RTO_data to furnish the RTO progenitor with a copy of the requested memory 
block from memory module 110, along with the state of that memory block (i.e., the content of the associated Mtag). 
Coherence transformer 200, as a bus entity, may not intervene if there is not a match between the requested memory 
10 block and one of the memory blocks tracked by tags in snoop tag array 250. If the RTO request is erroneous, e.g., 
requesting a non-existent memory block, memory module 110 may reply with a RTO-nack response, signifying that 
the RTO request is not acknowledged and needs to be retried by the RTO progenitor. 

[0111] Once the RTO_data response is received by the RTO progenitor from memory block 110, i.e., by processing 
unit 102 in this example, the RTO progenitor then examines the state of the enclosed Mtag to determine whether the 

15 current copy of the memory block received from memory module 110 can be employed to service the issued RTO 
request. If the state is gl, for example, it is understood that an external device currently has the exclusive copy of the 
memory block, and the RTO progenitor may issue a request to obtain that copy and invalidate all external copies via 
the remote RTO memory access request (RRTO). Details regarding the RTO and RRTO requests, as well as other 
requests described herein, are discussed more fully herein, particularly with reference to Fig. 12. 

20 [0112] If the Mtag state is gS, at least one external bus entity had a shared, read-only copy. In this case, it will be 
necessary to invalidate all shared copies existing internally and externally, and respond to the outstanding RTO request 
with the latest copy. If the state is gM, one of the internal entities has the latest valid copy and the RTO progenitor may 
proceed to employ the data returned in the RTO_data response from memory module 110 to satisfy its RTO needs 
(since it is assumed herein that there is no other internal entity to intervene with a later copy). 

25 [0113] A remote RTO (RRTO) memory access request is typically issued by a RTO progenitor after that RTO pro- 
genitor finds out, by ascertaining the state of the Mtag received from memory module 110, that the state of the Mtag 
is insufficient to service the current RTO request. Insufficient Mtag states in this case may be gS or gl, i.e., there may 
be a shared or exclusive copy of the requested memory block existing externally. If the RRTO is issued by the RTO 
progenitor responsive to a gM Mtag, coherence transformer 200 understands this to be an error condition (since state 

30 gM indicates that the internal domain, not the external domain, currently has the exclusive copy of the requested 
memory block) and may request the RRTO progenitor to retry to obtain the exclusive copy from the internal domain. 
[0114] If the RRTO is issued by the RTO progenitor responsive to a gS Mtag, coherence transformer 200 may respond 
to this RRTO command by invalidating external shared copy or copies, obtaining the latest copy of the requested 
memory block either from the external domain or the internal domain, invalidating all internal shared copy or copies, 

35 and returning that copy to the RRTO progenitor via the RTOR_data response. If the RRTO is issued by the RTO 
progenitor responsive to a gl Mtag, coherence transformer 200 may respond to this RRTO command by obtaining the 
external exclusive copy, invalidating that external exclusive copy, and returning that copy to the RRTO progenitor via 
the RTOR_data response. Further, coherence transformer 200 may perform a write back to memory module 110 to 
change the state of the Mtag corresponding to the requested memory block to gM via the RTOR response. If the RRTO 

40 request is erroneous, e.g., requesting a non-existent memory block, coherence transformer 200 may reply with a 
RTOR_nack response, signifying that the RRTO request is not acknowledged and needs to be retried by the RRTO 
progenitor. 

[01 15] A bus entity, e.g., processing node 1 02, may issue a memory access request for a shared, read-only copy of 
memory block 112(a) by issuing a RTS request. If no other internal bus entities intervenes, memory module 110 may 

45 respond to the outstanding RTS request with a RTS_data to furnish the RTS progenitor with a copy of the requested 
memory block from memory module 110, along with the state of that memory block (i.e., the content of the associated 
Mtag). Coherence transformer 200, as a bus entity, may not intervene if there is not a match between the requested 
memory block and one of the memory blocks tracked by tags in snoop tag array 250. If the RTS request is erroneous, 
e.g., requesting a non-existent memory block, memory module 110 may reply with a RTS-nack response, signifying 

50 that the RTS request is not acknowledged and needs to be retried by the RTS progenitor. 

[01 1 6] Once the RTS_data response is received by the RTS progenitor from memory block 1 1 0, i.e., processing unit 
1 02 in this example, the RTS progenitor then examines the state of the enclosed Mtag to determine whether the current 
copy of the memory block received from memory module 110 can be employed to service the current RTS need. 
Generally, if the state of the Mtag is gS, at least one internal bus entity currently has a shared, read-only copy and this 

55 RTS memory access request can be serviced either by another internal bus entity or by the data received from memory 
module 1 1 0 itself. If the state of the Mtag is gM, at least one internal bus entity currently has an exclusive copy and 
this RTS memory access request can be serviced either by another internal bus entity or by the data received from 
memory module 110 itself. 
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[01 17] If the state is gl , it is understood that an external device currently has the exclusive copy of the memory block 
and the RTS progenitor may issue a request to obtain that copy via the remote RTS memory access request (RRTS). 
If for some reason the RRTTS is issued by the RTS progenitor responsive to a gM or gS Mtag, coherence transformer 
200 understands this to be an error condition and will request the RTS progenitor to retry to obtain the shared copy 

5 from the internal bus entities. If the RRTS is issued by the RTS progenitor responsive to a gl Mtag, coherence trans- 
former 200 may respond to this RRTS command by obtaining the shared copy of the requested memory block from 
the external device and returning that copy to the RRTS progenitor via the RTSR_data response. Further, coherence 
transformer 200 performs a write back to memory module 110 to change the state of the Mtag corresponding to the 
requested memory block to gS (via the RTSR response). If the RRTS request is erroneous, e.g., requesting a non- 

10 existent memory block, coherence transformer 200 may reply with a RTSR_nack response, signifying that the RRTS 
request is not acknowledged and needs to be retried by the RRTS progenitor. 

[0118] Either one of the processing nodes, e.g., processing node 102, or coherence transformer 200 (on behalf of 
an external device) may issue a write back (WB) request to write back to memory 11 0 an exclusive copy of a memory 
block it earlier cached. If the WB request is erroneous, e.g., requesting a non-existent memory block, memory module 
15 no may reply with a WB jiack response, signifying that the WB request is not acknowledged and needs to be retried 
by the WB progenitor. 

[0119] On the other hand, if no WB_nack response is issued, the WB progenitor may follow up with a WB_data 
response to write back the memory block to memory module 110. Further, the state of the Mtag in memory module 
110 may also be changed to gM (if coherence transformer 200 requests the write back) to reflect the fact that the 

20 internal domain now has the exclusive copy of this memory block. 

[0120] As mentioned earlier, when there is a remote memory access request, e.g., an RRTO or a RRTS, on common 
bus 108, coherence transformer 200 (via coherence transformer link 220) receives this memory access request and 
formulates an appropriate response depending on the state of the Mtag. The operation of the coherence transformer 
200 may be more clearly understood with reference to Figs. 14 and 15. 

25 [0121] Fig. 14 illustrates, in one embodiment of the present invention, selected transactions performed by coherence 
transformer 200 in response to remote memory access requests on common bus 108. Referring now to Fig. 14, when 
a remote memory access request is issued by one of the internal bus entities on common bus 108, this remote memory 
access request is forwarded to ail bus entities, including coherence transformer 200. The remote request may be, 
however, ignored by all internal bus entities, e.g., processor 102. Responsive to the remote request, coherence trans- 

30 former 200 ascertains the current state of the Mtag (included in the remote request) to determine whether one of the 
external devices has an appropriate copy of the requested memory block for responding to the remote memory access 
request on common bus 108. The ascertaining of the Mtag state is necessary since it has been determined, in step 
504 of Fig. 8, that there is no match between the requested memory block and one of the memory blocks tracked in 
snoop tag array 250. 

35 

B1. Remote Request to Own (RRTO) 

[0122] If the remote memory access request is a request for an exclusive copy of a memory block (a RRTO) and the 
current Mtag state is gM, coherence transformer 200 understands this to be an error condition (since state gM indicates 

^0 that the internal domain, not the external domain, currently has the exclusive copy of the requested memory block) 
and may request the RRTO progenitor to retry to obtain the exclusive copy from the internal domain. 
[0123] On the other hand, the RRTO may be issued by the RTO progenitor in response to a gS or a gl Mtag. This 
occurs when the external domain is currently caching a valid copy of the requested memory block and there is no room 
in snoop tag array 250 for tracking this externally cached memory block. Consequently, coherence transformer 200 

45 would not be able to intervene to respond when the original RTO (issued by the RTO progenitor pertaining to this 
memory block) was present on common bus 108. With reference to Fig.8, coherence transformer 200 finds no tag 
match in step 504 and proceeds to let memory module 1 1 0 respond (in step 508 of Fig. 8 in accordance with the Mtag- 
only approach). 

[0124] If the RRTO is issued by the RTO progenitor responsive to a gS Mtag, coherence transformer 200 may respond 
so to this RRTO command by invalidating external shared copy or copies by issuing the X-protocol invalidate command 
XINV to request all external devices to invalidate their shared copies. Coherence transformer 200 may either broadcast 
the X-protocol commands or may simply direct the X-protocol command to the appropriate external device(s) if there 
is provided logic with coherence transformer 200 for keeping track of the locations and types of memory blocks cached. 
[0125] When all external copies have been invalidated (confirmed by the receipt of the X-protocol XINV_ack re- 
55 sponse) coherence transformer 200 may then obtain the latest copy of the requested memory block from the internal 
domain and invalidate ail internal shared copy or copies. In one embodiment, coherence transformer 200 may obtain 
the latest copy of the requested memory block from the internal domain and invalidate all internal shared copy or copies 
by issuing a RTO request to common bus 108. Upon receiving the requested copy from the internal domain (via the 
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RTO_data response), coherence transformer 200 may write back the copy to memory module 110 along with the 
appropriate Mtag, i.e., gM in this case, via the RTOR response. Thereafter, coherence transformer 200 may provide 
the requested copy to the RRTO progenitor via the RTOR_data response. 

[0126] Note that the use of the XI NV command advantageously invalidates all shared copies of the requested memory 
5 block cached by the external device(s). Further, the use of the RTO request by coherence transformer 200 to common 
bus 108 advantageously ensures that all internal shared copies within computer node 100 is invalidated and obtains 
the required memory block copy to forward to the requesting entity, i.e., the RRTO progenitor. 
[0127] If the RRTO request is issued by the RTO progenitor responsive to a gl Mtag, coherence transformer 200 
may respond to this RRTO command by obtaining the external exclusive copy and invalidating that external exclusive 
w copy via the X-protocol XRTO request. When the external exclusive copy is obtained (via the X-protocol XRTO_data 
response), coherence transformer 200 may perform a write back to memory module 110 to change the state of the 
Mtag corresponding to the requested memory block to gM via the RTOR response. Further, coherence transformer 
200 may return the copy of the requested memory block to the RRTO progenitor via the RTOR_data response. 

15 B2. Remote Request to Share (RRTS) 

[0128] If the remote memory access request is a request for a shared copy of a memory block (a RRTS) and the 
current state of the Mtag is gM or gS, coherence transformer 200 understands this to be an error condition (since these 
states indicate that there is at least one valid, i.e., shared or exclusive, copy internally) and will request the RTS pro- 

20 genitor to retry to obtain the shared copy from the internal bus entities. If the RRTS is issued by the RTS progenitor 
responsive to a gl Mtag, coherence transformer 200 may respond to this RRTS command by obtaining the shared 
copy of the requested memory block from the external device (via the X-protocol XRTS request). When the external 
shared copy is obtained (via the X-protocol XRTS_data response), coherence transformer 200 may perform a write 
back to memory module 1 1 0 to change the state of the Mtag corresponding to the requested memory block to gS via 

25 the RTSR response. Further, coherence transformer 200 may return the copy of the obtained memory block to the 
RRTS progenitor via the RTSR_data response. 

[0129] Coherence transformer 200 not only interacts with the processing nodes within computer nodes 100 to re- 
spond to remote memory access requests issued by those processing nodes, it also interacts with the external devices, 
e.g., external devices 202, 204, and 206, in order to service memory access requests pertaining to memory blocks 

30 having local physical addresses within computer node 100. 

[0130] Fig. 15 illustrates selected transactions performed by coherence transformer 200 in response to memory 
access requests from one of the external devices. In Fig. 15, the memory access requests are issued, using the afore- 
mentioned generalized X-protocol, by one of the external devices, e.g., one of devices 202, 204, or 206, to coherence 
transformer 200. If another external device currently caches the required copy of the requested memory block, this 

35 memory access request is preferably handled by logic circuitry provided with coherence transformer 200 without re- 
quiring the attention of coherence transformer 200 itself. 

[0131] On the other hand, if another external device does not have the valid copy of the requested memory block to 
service the memory access request, coherence transformer 200 then causes a memory access request to appear on 
common bus 108, using a protocol appropriate to computer node 100, so that coherence transformer 200 can obtain 

40 the required copy of the requested memory block on behalf of the requesting external device. Further, since a copy of 
the memory block is now cached by an external device, and it has been determined in step 606 of Fig. 9 that there is 
no additional room in snoop tag array 250 to track this externally requested memory block for the entire duration of it 
being externally cached, the Mtag associated with this memory block may need to be changed in memory module 110 
to reflect this change, i.e., the invention proceeds to handle this externally originated memory access request using 

45 the Mtag-only approach. 

B3. XRTO Memory Access Request 

[0132] Referring now to Fig. 15, when an external device issues a memory access request to obtain an exclusive 
so copy of a memory block having a local physical address within computer node 1 00, e.g. , memory block 11 2(a) , it issues 
a XRTO request to coherence transformer 200. Coherence transformer 200 then obtains the copy of the requested 
memory block from the internal domain and invalidates all internal copies of the request memory block (by issuing a 
RTO request to common bus 108). After receiving the copy of the requested memory block, coherence transformer 
200 then ascertains the state of the associated Mtag to determine its next course of action. 
55 [0133] If the state of the Mtag (contained in the RTO_data response) is gl, coherence transformer 200 understands 
this to be an error since the external domain does not have the exclusive copy (otherwise it would not need to request 
the exclusive copy from the internal domain) and the internal domain does not have either a shared or exclusive copy 
(gl Mtag state). The error condition may be handled using a variety of conventional techniques, e.g. , flag the error and/ 
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or perform a software or hardware reset. 

[0134] On the other hand, if the state of the Mtag is gM or gS, coherence transformer 200 then writes back to memory 
module 1 1 0 (via the WB request and WB_data response) the new state, i.e. , gl, to signify that there is no longer a valid 
copy of the requested memory block in the internal domain. In one embodiment, the write back may be performed with 
5 only the new state gl and without any other data for the requested memory block to save bandwidth on common bus 
1 08 (since any data associated with an invalid Mtag state would be ignored anyway). Thereafter, coherence transformer 
200 may forward the copy of the obtained memory block to the requesting external device via the X-protocol XRTO_data 
response. 

10 B4. XRTS Memory Access Request 

[01 35] When an external device issues a memory access request to obtain a shared copy of a memory block having 
a local physical address within computer node 100, e.g., memory block 112(a), it issues a XRTS request to coherence 
transformer 200. Coherence transformer 200 then obtains the copy of the requested memory block from the internal 

15 domain and writes the gS state to memory module 1 1 0 (by issuing a RTSM request to common bus 1 08 and receives 
the RTSM_data response). If the state of the Mtag is gl, coherence transformer 200 typically would receive a response 
from the memory module with Mtag gl. If the response is received and the Mtag state contained in the RTSM_data 
response is gl or, for some reason, there is no response, coherence transformer 200 understands this to be an error 
since the external domain does not have the exclusive copy (otherwise it would not need to request the exclusive copy 

20 from the internal domain) and the internal domain does not have either a shared or exclusive copy (gl Mtag state). The 
error condition may be handled using a variety of conventional techniques, e.g., flag the error and/or perform a software 
or hardware reset. 

[0136] On the other hand, if the state of the Mtag is gM or gS, coherence transformer 200 may forward the copy of 
the obtained memory block to the requesting external device via the X-protocol XRTS__data response. 
25 [0137] Note that the RTSM and RTSM_data sequence may equally be substituted by a sequence containing RTO 
(from coherence transformer 200 to common bus 108), RTO_data (from common bus 108 to coherence transformer 
200), WB (from coherence transformer 200 to common bus 108 to ask permission to write to memory module 110), 
and WB_data (writing the gS Mtag to the corresponding memory block in memory module 110. 

30 B5. XWB Request 

[0138] When an external device issues a request to write back an exclusive copy of a memory block it earlier cached 
from computer node 100, it issues a X-protocol XWB request to coherence transformer 200. In accordance with the 
Mtag-only approach, coherence transformer 200 may then obtain a copy of the requested memory block from the 

35 internal domain to ascertain the current state of the associated Mtag. If the current state is gM or gS, coherence 
transformer 200 understands this to be an error since the external domain, which requests to write back, must have 
the only valid, exclusive copy and there must be no other valid (whether exclusive or shared) copy of the same memory 
block anywhere else in the computer system. The error condition may be handled using a variety of conventional 
techniques, e.g., flag the error and/or perform a software or hardware reset. 

40 [0139] On the other hand, if the state of the Mtag is gl, coherence transformer 200 then proceeds to receive from 
the external device the data to be written back (via the X-protocol XWB_data response) and writes this data, along 
with the new gM Mtag state, to the appropriate memory location in memory module 110. In one embodiment, the writing 
of both the data and the gM Mtag state can be accomplished by issuing a WSgM command to common bus 1 08, which 
requests the writing of both data and new Mtag, to be followed by the data and the new gM Mtag in the WSgM_data 

45 command. 

[01 40] Note that the WSgM and WSgM_data sequence may well be substituted by a sequence containing RTO (from 
common bus 108 on behalf of memory module 110 to coherence transformer 200), RTO_data (from coherence trans- 
former 200 to common bus 1 08 to furnish the old data be overwritten from memory module 1 1 0), WB (from coherence 
transformer 200 to common bus 1 08 to ask permission to write to memory module 110), and WB_data (writing the gM 

50 Mtag to the corresponding memory block in memory module 110). 

[0141] In accordance with the Mtag-only approach, the use of a coherence transformer and the tightly-coupled re- 
quest-response transactions, advantageously permit external devices, which may be employing protocols different 
from the protocol on common bus 1 08 of computer node 1 00, to share memory blocks having local physical addresses 
within computer node 1 00. Further, coherence transformer 200 makes this sharing possible even if the external devices 

55 may each be operating at a different operating speed from that on common bus 1 08. 

[0142] Note that the external devices do not need to accommodate Mtags to participate in memory sharing. Only 
the bus entities, e.g., memory module 110, the processors coupled to common bus 108, and coherence transformer 
200, need to be aware of the existence of Mtags to employ them in avoiding coherence problems. Consequently, this 
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feature of coherence transformer 200 advantageously permits a properly configured computer node 100 to work with 
a wide range of existing external devices to facilitate memory sharing without requiring any modification to the external 
devices. 

[01 43] The Mtag-only approach advantageously permits the external devices to cache any number of memory blocks. 

5 Due to the existence of Mtags, coherence transformer 200 advantageously does not need to keep track of every 
memory block currently cached by the external devices for the purpose of deciding whether coherence transformer 
200 should intervene in servicing a memory access request on common bus 108. This is in contrast with the snoop- 
only approach, which requires a tag to track every externally cached memory block and which is employed herein only 
when there is still room in snoop tag array 250 to track the externally cached memory blocks. When there is no more 

10 room in snoop tag array 250 to track the externally cached memory blocks, the Mtag-only approach can advantageously 
be employed to facilitate the caching of memory blocks by external devices in a coherent manner. 
[0144] In accordance with one aspect of the inventive Mtag-only approach, the bus entity that obtains the memory 
block from memory module 110 decides for itself, upon ascertaining the Mtag state of the obtained memory block, 
whether it needs to further request a more recent copy from the external device (via the remote requests RRTO and 

is RRTS directed at coherence transformer 200). 

[0145] In one embodiment, coherence transformer 200 in the Mtag-only approach is provided with at least one buffer 
block, e.g., buffer 280 of Fig. 7A, for temporarily storing a copy of the memory block most recently accessed by one 
of the external device. The buffer block may store both the address of the memory block and the relevant Mtag data 
(or alternatively the external states eS, el, or eM since Mtags and external states can be derived from one another). 

20 The buffer block advantageously permits coherence transformer 200 to perform write back to memory module 1 1 0 to 
change the state of the Mtag in memory module 110. 

[0146] While operating in the Mtag-only approach, in the interval after coherence transformer 200 obtains the copy 
of the memory block requested and before coherence transformer 200 performs a write back to change the Mtag, e. 
g., responsive to a XRTO request from an external device, coherence transformer 200 may, using the data stored in 
25 the buffer, monitor common bus 108 to intervene. The intervention may be necessary if, for example, another internal 
bus entity requests this memory block during the aforementioned interval. 

[01 47] Note that once the write back is performed to change the Mtag to the appropriate state, it is no longer necessary 
to keep a copy of that memory block in the buffer. Because a copy of a memory block is typically kept in a buffer for a 
very short time, the number of buffers required may be quite small. 

30 [0148] Further, since a response to an externally-originated memory access request, e.g., XRTO, XRTS or XWB, 
requires knowledge of the state of the corresponding Mtag, there is optionally provided, as an optimization technique 
in one embodiment, a Mtag cache array for tracking some or all memory blocks of memory module 1 1 0. For example, 
a Mtag cache array may be provided to track only the Mtag states of the memory blocks externally cached. Alternatively, 
a Mtag cache array may be employed to track the Mtag states of every memory block in memory module 110. 

35 [0149] As another embodiment, an Mtag cache array may be provided to track only memory blocks whose Mtag 
states are gS and gl. This embodiment is particularly advantageous in computer systems in which a relatively small 
number of memory blocks are externally cached at any given time. In such a computer system, most memory blocks 
would have a gM state, and relative few would have gS and gl Mtag states 

[0150] When coherence transformer 100 requires knowledge of the Mtag state associated with a given memory 
*o block, it checks the Mtag cache array first. In case of a cache hit, no bandwidth of common bus 108 is required to 
ascertain the Mtag state. In case of the cache miss, coherence transformer 200 may proceed to inquire, via common 
bus 108 as discussed herein, the state of the associated Mtag to determine its proper course of action. Note that the 
presence of a Mtag cache array is not absolutely necessary and is equally well to have an implementation wherein no 
Mtag caching is performed (in which case coherence transformer inquires, via common bus 1 08, the Mtag state when 
45 jt needs this information). 

[0151] As is apparent from the foregoing, when there is sufficient room in snoop tag array 250 of snoop coherence 
transformer 200 to keep track of externally cached memory blocks, the embodiment advantageously operates in the 
snoop-only mode. In this mode; the performance of the hybrid approach described herein is as efficient as that of the 
snoop techniques. Note that in the snoop-only approach, when an external device caches a memory block, there is 
so advantageously no need to write back to memory module 110 the new Mtag. In this manner, common bus 1 08 is only 
employed once to furnish the requested memory block to coherence transformer 200 to service the externally originated 
memory access request, and there is no need to use common bus 1 08 again to write the new Mtag to memory module 
110, as would be required in the Mtag-only approach. 

[0152] When there is no room left in snoop tag array 250 to keep track of externally cached memory blocks, the 
55 embodiment advantageously switches to the Mtag-only approach. In this approach, there is no need to track the ex- 
ternally cached memory block in snoop tag array 250. 

[0153] Although additional bandwidth on common bus 108 is required to write back the new Mtag state to memory 
module 110, the Mtag-only approach is still an advantageous mode of operation when there is no room left in snoop 
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tag array 250. This is because the Mtag-only approach, unlike the snoop approach, does not require the forcible write 
back of a memory block that is externally cached previously for the purpose of freeing up a tag in snoop tag array 250 
in order to track the newly cached memory block. The forcible write back of a memory block that is externally cached 
earlier is a time consuming operation since snoop coherence transformer 200 must decide which of the multiple memory 

5 blocks externally cached should be written back, and must go out to the external device to invalidate the externally 
cached copy before writing it back to memory module 110. Such an action is required in the snoop-only approach 
whenever there is no more tag in snoop tag array 250 since the snoop only approach requires that each externally 
cached memory block be tracked by a tag in snoop tag array 250, and unless a forcible write back is performed on a 
memory block that is externally cached previously to unallocate a tag, there is no tag in snoop tag array 250 to service 

10 the new externally requested memory block. 

[01 54] As is apparent, the exact number of tags in snoop tag array 250 depends on the needs of a particular system. 
In general, as many tags as reasonably possible should be provided in snoop tag array 250 to defer the need to operate 
in the Mtag-only approach. Note that the tags are recycled since when an external device writes back a memory block 
that is tracked in snoop tag array 250, the tag that is employed to track this externally cached memory block is then 

15 freed up, allowing coherence transformer 200 to service the next externally originated memory access request using 
the snoop-only approach. 

[01 55] Note that the embodiment, with a finite number of tags in snoop tag array 250, takes advantage of the optimum 
operating range of the snoop approach while avoiding its less efficient operating point. The embodiment advantageously 
operates in the optimum operating range of the snoop approach when there are tags in snoop tag array 250 to track 

20 externally cached memory blocks, thereby avoiding the inefficiency associated with the Mtag-only approach in this 
operating range (i.e., the need of the Mtag approach to write the Mtag back to memory module 110). 
[0156] When there are no more tags in snoop tag array 250 to track externally cached memory blocks, the embod- 
iment advantageously avoids the more inefficient operating mode associated with the snoop approach for this operating 
range (i.e., the mode wherein forcible write backs of memory blocks externally cached in previous cycles are neces- 

25 sitated). In this operating range, the embodiment advantageously switches to an Mtag-only operating mode, thereby 
allowing the system to operate at a relatively higher efficiency. 

[01 57] The embodiment has been described as allowing one coherence transformer per bus. System designers may, 
in some cases, want to attach several coherence transformers to a bus to connect many alternative device of the same 
or different types, e.g. I/O devices, DSM memory agents, coherence domain devices, and the like. The implementation 
30 of multiple coherence transformers would be apparent to those skilled in the art given this disclosure. In a multiple 
coherence transformer implementation, Mtags may be extended with a field to identify which coherence transformer 
has the block externally so that processors know which coherence transformer should receive the appropriate RRTO's 
and RRTS's. 

[0158] It should also be noted that there are many alternative ways of implementing the methods and apparatuses 
35 of the present invention as claimed. By way of example, some systems may crate an illusion of a common bus without 
requiring a physical bus (e.g., via a set of broadcast wires). The KSR-1 from Kendall Square Research of Massachusetts 
is one such example. The present invention applies equally well to these and other analogous systems. 

40 Claims 

1 . A method of enabling an external device (202, 204, 206) in an external domain that is external to a computer node 
(100) of a computer system to share memory blocks (112) having local physical addresses in a memory module 
(110) at said computer node irrespective whether said external device and a common bus (108) at said computer 

^5 node both employ a common protocol and irrespective whether said external device and said common bus both 

operate at the same speed, said computer node including a coherence transformer (200), said memory module 
and a processing node connected to said common bus, said processing node (102, 104, 106) having a processor 
(116) and a cache (114), each of said memory blocks having an associated Mtag for tracking a global state asso- 
ciated with each memory block, including a global exclusive state for indicating that each memory block is exclusive 

50 to said computer node, a global shared state for indicating that each memory block is shared by said computer 

node with said external device, and a global invalid state for indicating that each memory block is invalid in said 
computer node, said method comprising: 

snooping said common bus to monitor memory access requests on said common bus; 

55 

receiving, at the coherence transformer, a first memory access request for caching a first memory block from 
said external device; 
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obtaining a first copy of said first memory block, using said coherence transformer, from said common bus, 
characterised in that said coherence transformer having a snoop tag array (250) having a plurality of snoop 
tags, each of said plurality of snoop tags being configured to identify one of said memory blocks if cached by 
said external device and to track an external state of a copy of that memory block, said external state including 
one of an external exclusive state for indicating that said copy of that memory block is exclusive to said external 
domain, an external shared state for indicating that said copy of that memory block is shared by said external 
domain, and an external invalid state for indicating that said copy of that memory block is invalid in said external 
domain; and 

if at least one tag in said plurality of snoop tags is available for tracking said external state of said first copy 
of said first memory block, responding to said first memory access request using a snoop-only approach in 
which that tag is used to track said external state of said first copy of said first memory block for an entire 
duration, that said first memory block is cached by said external device; 

else if at least one tag in said plurality of snoop tags is not available for tracking said external state of said first 
copy of said first memory block, responding to said first memory access request using an Mtag-only approach 
in which, using said coherence transformer, a tag for said first memory block is temporarily stored until a global 
state associated with said first memory block can be written back into said memory module; 

said first copy of said first memory block being sent from said coherence transformer to said external device. 

The method of claim 1 wherein said first memory access request from said external device represents a request 
for an exclusive copy of said first memory block and said step of responding to said first memory access request 
using said Mtag-onty approach further includes a step of changing said first Mtag in said memory module to a 
global invalid state. 

The method of claim 2 wherein said step of responding to said first memory access request using said Mtag-only 
approach further comprising a step of invalidating all valid copies of said first memory block at said computer node. 

The method of claim 1 wherein said first memory access request from said external device represents either a 
request for an exclusive copy of said first memory block or a request for a shared copy of said first memory block, 
said step of responding to said first memory access request using said Mtag-only approach further includes the 
steps of: 

prior to said modifying step, examining said first Mtag associated with said first memory block; and 

proceeding with said modifying step and said sending step only if said first Mtag does not represent a global 
invalid state. 

The method of claim 1 wherein said first memory access request from said external device represents a request 
for a shared copy of said first memory block and said step of responding to said first memory access request using 
said Mtag-only approach further includes a step of changing said first Mtag in said memory module to a global 
shared state. 

The method of claim 5 wherein said step of responding to said first memory access request using said Mtag-only 
approach further includes the steps of: 

prior to said modifying step, examining said first Mtag associated with said first memory block; 

proceeding with said modifying step and said sending step only if said first Mtag does not represent a global 
invalid state; and 

if said first Mtag represents a global invalid state, flagging an error condition. 

The method of claim 1 further comprising the steps of: 

receiving a write back request for a second memory block from said external device at said coherence trans- 
former; 
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obtaining said first copy of said second memory block, using said coherence transformer, from said external 
device; 

writing said first copy of said second memory block from said coherence transformer to said memory module 
5 at said computer node; and 

if said first copy of said first memory block is not tracked in a snoop tag of said snoop tag array, modifying, 
using said coherence transformer, an Mtag associated with said second memory block in said memory module 
at said computer node to reflect that said computer node has an exclusive copy of said second memory block. 

10 

8. The method of claim 1 wherein said global state for said each of said memory blocks is employed as said external 
state for said each of said memory blocks, whereby a global exclusive state represents an external invalid state, 
a global shared state represents an external shared state, and a global invalid state represents an external exclu- 
sive state. 

15 

9. The method of claim 1 further comprising the steps of: 

receiving a writeback request for said first memory block from said external device at said coherence trans- 
former, 

20 

obtaining said first copy of said first memory block, using said coherence transformer, from said external device; 

writing said first copy of said first memory block from said coherence transformer to said memory module at 
said computer node; and 

25 

if said first copy of said first memory block was tracked in a snoop tag of said snoop tag array prior to said 
writing step, unallocating said snoop tag of said snoop tag array, thereby rendering said snoop tag available 
for tracking other externally cached memory blocks and causing said first copy of said first memory block to 
be no longer tracked by said snoop tag array. 

30 

1 0. The method of claim 1 further comprising the step of responding, through said coherence transformer, to a second 
memory access request on said common bus on behalf of said external device, comprising: 

monitoring memory access requests on said common bus, using said coherence transformer, to determine 
35 whether a second memory access request of said memory access requests on said common bus pertains to 

any one of memory blocks tracked in snoop tags of said snoop tag array; and 

if said second memory access request pertains to a second memory block, said second memory block rep- 
resenting said one of memory blocks tracked in said snoop tags of said snoop tag array .responding to said 
40 second memory access request using said snoop-only approach, including responding to said second memory 

access request using said coherence transformer. 

11. The method of claim 1 0 wherein said coherence transformer responds to said second memory access request in 
said snoop-only approach only if a snoop tag tracking said second memory block in said snoop tag array indicates 

45 that a first copy of said second memory block is valid at said external device. 

12. The method of claim 11 wherein said second memory access request is a request for an exclusive copy and said 
snoop tag tracking said second memory block indicates that said first copy of said second memory block at said 
external device is an exclusive copy of said second memory block, said step of responding to said second memory 

so access request using said snoop-only approach comprises: 

obtaining, using said coherence transformer, a second copy of said second memory block from said first copy 
of said second memory block at said external device; 

55 invalidating said first copy of said second memory block at said external device; and 

forwarding said second copy of said second memory block from said coherence transformer to said common 
bus to enable a progenitor of said second memory access request to obtain said second copy of said second 
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memory block; and 

unallocating said snoop tag of said snoop tag array, thereby rendering said snoop tag available for tracking 
other externally cached memory blocks and causing said second memory block to be no longer tracked by 
5 said snoop tag array. 

13. The method of claim 11 wherein said second memory access request is a request for an exclusive copy and said 
snoop tag tracking said second memory block indicates that said first copy of said second memory block at said 
external device is a shared copy of said second memory block, said step of responding to said second memory 

10 access request using said snoop-only approach comprises: 

invalidating said first copy of said second memory block at said external device; 

obtaining, using said coherence transformer, a second copy of said second memory block from said computer 
15 node via said common bus; 

invalidating, using said coherence transformer, any valid copy of said second memory block in said computer 
node; and 

20 forwarding said second copy of said second memory block from said coherence transformer to said common 

bus to enable a progenitor of said second memory access request to obtain said second copy of said second 
memory block; and 

unallocating said snoop tag of said snoop tag array, thereby rendering said snoop tag available for tracking 
25 other externally cached memory blocks and causing said second memory block to be no longer tracked by 

said snoop tag array. 

14. The method of claim 11 wherein said second memory access request is a request for a shared copy and said 
snoop tag tracking said second memory block indicates that said first copy of said second memory block at said 

30 external device is a shared copy of said second memory block, said step of responding to said second memory 

access request using said snoop-only approach comprises: 

obtaining, using said coherence transformer, a second copy of said second memory block from said computer 
node via said common bus; and 

35 

forwarding said second copy of said second memory block from said coherence transformer to said common 
bus to enable a progenitor of said second memory access request to obtain said second copy of said second 
memory block. 

^0 15. The method of claim 11 wherein said second memory access request is a request for a shared copy and said 
snoop tag tracking said second memory block indicates that said first copy of said second memory block at said 
external device is an exclusive copy of said second memory block, said step of responding to said second memory 
access request using said snoop-only approach comprises: 

45 obtaining, using said coherence transformer, a second copy of said second memory block from said external 

device; 

changing said snoop tag tracking said second memory block to indicate that said first copy of said memory 
block at said external device is a shared copy of said second memory block; and 

50 

forwarding said second copy of said second memory block from said coherence transformer to said common 
bus to enable a progenitor of said second memory access request to obtain said second copy of said second 
memory block. 

55 16. A coherence transformer (200) for facilitating the sharing of memory blocks (112) between a computer node (100) 
and an external device, said computer node including a common bus (108) to which said coherency transformer, 
a memory module (1 1 0) and a processing node (1 02, 1 04, 1 06) with a processor and cache (114) are connected, 
said memory blocks having local physical addresses in the memory module at said computer node, each of said 
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memory blocks having an associated Mtag for tracking a global state associated with each memory block, including 
a global exclusive state for indicating that memory block is exclusive to said computer node, a global shared state 
for indicating that memory block is shared by said computer node with said external device, and a global invalid 
state for indicating that each memory block is invalid in said computer node, said coherence transformer compris- 
5 ing: 

snooping logic (260) configured for coupling with the common bus of said computer node, said snooping logic, 
when coupled to said common bus, being operable to monitor memory access requests on said common bus; 
said coherence transformer being characterised by comprising: 

10 

a snoop tag array (250) coupled to said snooping logic, said snoop tag array having a plurality of snoop 
tags (273,274,276,278,280), each of said plurality of snoop tags being configured to identify one of said 
memory blocks if cached by said external device and to track an external state of a copy of that memory 
block, said external state including one of an external exclusive state for indicating that said copy of that 
'5 memory block is exclusive to said external domain, an external shared state for indicating that said copy 

of that memory block is shared by the said external domain, and an external invalid state for indicating 
that said copy of that memory block is invalid in said external domain; and 

logic means for ascertaining (504, 506, 508) whether a first memory access request from said external 
20 device for caching a first memory block should be responded to using a snoop-only approach in which a 

tag in said snoop tag array is operable to track said external state of a copy of said first memory block for 
an entire duration that said first memory block is cached by said external device, or using an Mtag-only 
approach in which a tag for said first memory block is temporarily stored until a global state associated 
with said first memory block can be written back into said memory module. 

25 

1 7. The coherence transformer of Claim 1 6 further comprising logic (606,608,61 0) for ascertaining whether a second 
memory access for a second memory block on said common bus should be responded to using said snoop-only 
approach or said Mtag-only approach, said second memory access being responded to by said coherence trans- 
former using said snoop-only approach when said second memory block is tracked by said snoop tag array, said 

30 second memory block being responded to by said memory module using said Mtag-only approach when said 

second memory access is not tracked by said snoop tag array 

1 8. A computer system having a computer node (1 00), said coherence transformer (200) of claim 1 6 or claim 1 7 and 
an external device, said computer node including a common bus (1 08) to which said coherence transformer (200), 

35 a memory module (110) and a processing node (102, 104, 106) with a processor (116) and a cache (114) are 

connected. 



Patentanspruche 

1. Verfahren zum Zulassen, dass eine externe Einrichtung (202, 204, 206) in einem externen Domain, das auGerhalb 
eines Compute rknotens (100) eines Computersystems liegt, Speicherblocke (112) mit lokalen physikalischen 
Adressen in einem Speichermodul (110) an dem Computerknoten gemeinsam zu benutzen unabhangig davon, 
ob die externe Einrichtung und ein gemeinsamer Bus (108) an dem Computerknoten beide ein gemeinsames 
Protokoll verwenden und unabhangig davon, ob die externe Einrichtung und der gemeinsame Bus beide mit der- 
selben Geschwindigkeit arbeiten, wobei der Computerknoten einen Koharenztransformator (200) einschlieBt, das 
Speichermodul und einen mit dem gemeinsamen Bus verbundenen Verarbeitungsknoten, wobei der Verarbei- 
tungsknoten (102, 104, 106) einen Prozessor (1 1 6) hat und einen Cache-Speicher (114), jeder der Speicherblocke 
ein zugeordnetes Speicheretikett bzw. Mtag hat zum Verfolgen eines globalen Zustandes, der jedem Speicherblock 
zugeordnet ist einschlieBlich eines globalen AusschlieBlichkeits-Zustandes zum Anzeigen, dass der jeweilige Spei- 
cherblock ausschlieBlich fur den Computerknoten ist, eines globalen Geteilt-Zustandes zum Anzeigen, dass der 
jeweilige Speicherblock geteilt wird durch den Computerknoten mit externen Einrichtungen und eines globalen 
Ungultigkeits-Zustandes zum Anzeigen, dass der jeweilige Speicherblock ungiiftig ist in dem Computerknoten, 
wobei das Verfahren umfasst: 

Beschnuffeln des gemeinsamen Busses zum Uberwachen der Speicherzugriffsanforderungen auf dem ge- 
meinsamen Bus; Empfangen einer ersten Speicherzugriffsanforderung beim Koharenztransformator zum Ca- 
chen eines ersten Speicherblocks von der externen Einrichtung; 
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Erhalten einer ersten Kopie von dem ersten Speicherblock unter Verwendung des Koharenztransformators 
von dem gemeinsamen Bus, dadurch gekennzeichnet, dass der Koharenztransformator ein Schnuffeleti- 
ketten-Array bzw. Schnuff el -Tag- Array (250) hat mit einer Vielzah! von Schnuffel-Tags, jedes der Vielzahl von 
Schnuff el-Tags konfiguriert istzum Identifizieren eines der Speicherblocke, wenn ervon derexternen Einrich- 
tung gecached ist und zum Verfolgen eines externen Zustandes einer Kopie des Speicherblocks, wobei der 
externe Zustand eines einschlieBt von einem externen AusschlieBlichkeits-Zustand zum Anzeigen, dass die 
Kopie des Speicherblocks ausschlieBlich fur die externe Domain ist, eines externen Geteitt-Zustandes zum 
Anzeigen, dass die Kopie des Speicherblocks geteilt wird durch die externe Domain, und eines externen Un- 
gultigkeits-Zustandes zum Anzeigen, dass die Kopie des Speicherblocks ungultig ist in der externen Domain; 
und 

wenn mindestens ein Tag in der Vielzahl von Schnuffel-Tags verfugbar ist zum Verfolgen des externen Zu- 
standes der ersten Kopie des ersten Speicherblocks, ansprechend auf die erste Speicherzug riff anforde rung 
unter Verwendung einer Nur-Schniiffel-Methode, bei der das Tag verwendet wird zum Verfolgen des externen 
Zustandes der ersten Kopie des ersten Speicherblocks fur eine gesamte Dauer, zu der der erste Speicherblock 
gecached wird durch die externe Einrichtung; 

andernfalls, wenn mindestens ein Tag in der Vielzahl von Schnuffel-Tags nicht verfugbar ist zum Verfolgen 
des externen Zustandes der ersten Kopie des ersten Speicherblocks ansprechend auf die erste Speicherzu- 
griffsanforderung unter Verwendung einer Nur-Mtag-Methode, bei der unter Verwendung des Koharenztrans- 
formators ein Tag fur den ersten Speicherblock temporar gespeichert wird, bis ein globaler Zustand, der dem 
ersten Speicherblock zugeordnet ist, zuruckgeschrieben werden kann in das Speichermodul; 

wobei die erste Kopie des ersten Speicherblocks von dem Koharenztransformator zu der externen Einrichtung 
gesendet wird. 

Verfahren nach Anspruch 1, wobei die erste Speicherzug riffsanforderung von der externen Einrichtung eine An- 
forderung reprasentiert fur eine exklusive Kopie des ersten Speicherblocks und der Schritt des Ansprechens auf 
die erste Speicherzugriffsanforderung unter Verwendung der Nur-Mtag-Methode auBerdem einen Schritt ein- 
schlieBt des Andems des ersten Mtag in dem Speichermodul zu einem globalen Ungultigkeits-Zustand. 

Verfahren nach Anspruch 2, wobei der Schritt des Ansprechens auf die erste Speicherzugriffsanforderung unter 
Verwendung der Nur-Mtag-Methode auBerdem einen Schritt umfasst des Ungultigmachens alter gultigen Kopien 
des ersten Speicherblocks bei dem Computerknoten. 

Verfahren nach Anspruch 1 , wobei die erste Speicherzugriffsanforderung von der externen Einrichtung entweder 
eine Anforderung beziiglich einer ausschlieBlichen Kopie des ersten Speicherblocks reprasentiert odereine An- 
forderung beziiglich einer geteilten Kopie des ersten Speicherblocks, wobei der Schritt des Ansprechens auf die 
erste Speicherzugriffsanforderung unter Verwendung der Nur-Mtag-Methode auBerdem die Schritte umfasst: 

vor dem Modifizierungsschritt, Prufen des ersten, dem ersten Speicherblock zugeordneten Mtag; und 

Fortfahren mit dem Modifizierungsschritt und dem Sendeschritt nur, wenn das erste Mtag nicht einen globalen 
Ungultigkeits-Zustand reprasentiert. 

Verfahren nach Anspruch 1 , wobei die erste Speicherzugriffsanforderung von der externen Einrichtung eine An- 
forderung fur eine geteilte Kopie des ersten Speicherblocks reprasentiert und der Schritt des Ansprechens auf die 
erste Speicherzugriffsanforderung unter Verwendung der Nur-Mtag-Methode auBerdem einen Schritt einschlieBt 
des Anderns des ersten Mtags in dem Speichermodul zu einem globalen Geteilt-Zustand. 

Verfahren nach Anspruch 5, wobei der Schritt des Ansprechens auf die erste Speicherzugriffsanforderung unter 
Verwendung der Nur-Mtag-Methode auBerdem die Schritte einschlieBt: 

vor dem Modifizierungsschritt, Prufen des ersten Mtag, das dem ersten Speicherblock zugeordnet ist; 

Fortschreiten mit dem Modifizierungsschritt und dem Sendeschritt nur, wenn das erste Mtag nicht einen glo- 
balen Ungultigkeits-Zustand reprasentiert; und 
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wenn das erste Mtag einen globalen Ungultigkeits-Zustand reprasentiert, Anzeigen einer Fehlerbedingung. 

7. Verfahren nach Anspruch 1 , auBerdem die Schritte umfassend: 

Empfangen einer Ruckschreibanforderung fur einen zweiten Speicherblock von der externen Einrichtung bei 
dem Koharenztransformator: 

Erhalten der ersten Kopie des zweiten Speicherblocks unter Verwendung des Koharenztransformators 
von der externen Einrichtung; 

Schreiben der ersten Kopie des zweiten Speicherblocks von dem Koharenztransformator zu dem Spei- 
chermodul an dem Computerknoten; und 

wenn die erste Kopie des ersten Speicherblocks nicht verfolgt worden ist in einem Schnuffel-Tag des 
Schnuffel-Tag-Arrays, Modifizieren unter Verwendung des Koharenztransformators eines Mtags, das dem 
zweiten Speicherblock in dem Speichermodul bei dem Computerknoten zugeordnet ist zum Reflektieren, 
dass der Computerknoten eine ausschlieBliche Kopie des zweiten Speicherblocks hat. 

8. Verfahren nach Anspruch 1 , wobei der globale Zustand fur die jeweiligen der Speicherblocke verwendet wird als 
externer Zustand fur die jeweiligen der Speicherblocke, wobei ein globaler AusschlieBlichkeits-Zustand einen ex- 
ternen Ungultigkeits-Zustand reprasentiert, ein globaler Geteilt-Zustand einen externen Geteilt-Zustand reprasen- 
tiert und ein globaler Ungultigkeits-Zustand einen externen AusschlieBlichkeits-Zustand reprasentiert. 

9. Verfahren nach Anspruch 1 , auBerdem die Schritte umfassend: 

Empfangen einer Ruckschreibanforderung fur den ersten Speicherblock von der externen Einrichtung bei dem 
Koharenztransformator; 

Erhalten der ersten Kopie des ersten Speicherblocks unter Verwendung des Koharenztransformators von der 
externen Einrichtung; 

Schreiben der ersten Kopie des ersten Speicherblocks von dem Koharenztransformator zu dem Speichermo- 
dul an dem Computerknoten; und 

wenn die erste Kopie des ersten Speicherblocks in einem Schnuffel-Tag des Schnuffel-Tag-Arrays verfolgt 
worden ist, vor dem Schreibschritt, Aufheben der Zuordnung des Schnuff el-Tags des Schnuffel-Tag-Arrays, 
hierdurch das Schnuffel-Tag verfugbar machend zum Verfolgen anderer extern gecacheter Speicherblocke 
und Veranlassen, dass die erste Kopie des ersten Speicherblocks nicht langer verfolgt wird durch das Schnuf- 
fel-Tag-Array. 

10. Verfahren nach Anspruch 1 , auBerdem den Schritt umfassend des Ansprechens durch den Koharenztransformator 
auf eine zweite Speicherzugriffsanforderung auf dem gemeinsamen Bus im Auftrag der externen Einrichtung, 
umfassend: 

Uberwachen derSpeicherzugriffsanforderungen auf dem gemeinsamen Bus, Verwenden des Koharenztrans- 
formators zum Bestimmen, ob eine zweite Speicherzugriffsanforderung der Speicherzugriffsanforderungen 
auf dem gemeinsamen Bus zu irgendeinem der in Snoop-Tags des Snoop-Tag-Arrays verfolgten Speicher- 
blocke gehort; und 

wenn die zweite Speicherzugriffsanforderung zu einem zweiten Speicherblock gehort, wobei der zweite Spei- 
cherblock den einen der in den Schnuff el-Tags des Schnuffel-Tag-Arrays verfolgten Speicherblocken repra- 
sentiert, ansprechend auf die zweite Speicherzugriffsanforderung unter Verwendung der Nur-Schnuffel-Me- 
thode, einschlieBlich des Ansprechens auf die zweite Speicherzugriffsanforderung, unter Verwendung des 
Koharenztransformators. 

11. Verfahren nach Anspruch 10, wobei der Koharenztransformator auf die zweite Speicherzugriffsanforderung in der 
Nur-Schnuffel-Methode nur anspricht, wenn ein Schnuffel-Tag, das den zweiten Speicherblock in dem Schnuffel- 
Tag-Array verfolgt, anzeigt, dass eine erste Kopie des zweiten Speicherblocks gultig ist in der externen Einrichtung. 
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12. Verfahren nach Anspruch 11, wobei die zweite Speicherzugriffsanforderung eine Anforderung fur eine 
ausschlieBliche Kopie ist und das Schniiff el-Tag, das den zweiten Speicherblock verfolgt, anzeigt, dass die erste 
Kopie des zweiten Speicherblocks bei der extemen Einrichtung eine ausschlieBliche Kopie des zweiten Speicher- 
blocks ist, wobei der Schritt des Ansprechens auf die zweite Speicherzugriffsanforderung unter Verwendung der 
5 Nur-Schniiffel-Methode umfasst: 

Erhalten einer zweiten Kopie des zweiten Speicherblocks unter Verwendung des Koharenztransformators von 
der ersten Kopie des zweiten Speicherblocks bei der externen Einrichtung; 

10 Ungultigmachen der ersten Kopie des zweiten Speicherblocks bei der extemen Einrichtung; und 

Weiterleiten der zweiten Kopie des zweiten Speicherblocks von dem Koharenztransformator zu dem gemein- 
samen Bus zum Befahigen eines Erzeugers der zweiten Speicherzugriffsanforderung, die zweite Kopie des 
zweiten Speicherblocks zu erhalten; und 

15 

Aufheben der Zuordnung des Schniiff el-Tags des Schniiffel-Tag-Arrays, hierdurch verfugbar machen des 
Schnuff el-Tags zum Verfolgen anderer extern gecacheterSpeicherblocke und Veranlassen des zweiten Spei- 
cherblocks, nicht langer verfolgt zu werden durch das Schniiffel-Tag-Array. 

20 13. Verfahren nach Anspruch 11, wobei die zweite Speicherzugriffsanforderung eine Anforderung fur eine 
ausschlieBliche Kopie ist und das den zweiten Speicherblock verfolgende Schnuff el-Tag anzeigt, dass die erste 
Kopie des zweiten Speicherblocks bei der externen Einrichtung eine geteilte Kopie des zweiten Speicherblocks 
ist, wobei der Schritt des Ansprechens auf die zweite Speicherzugriffsanforderung unter Verwendung der Nur- 
Schniiffel-Methode umfasst: 

25 

Ungultigmachen der ersten Kopie des zweiten Speicherblocks bei der externen Einrichtung; 

Erhalten einer zweiten Kopie des zweiten Speicherblocks unter Verwendung des Koharenztransformators von 
dem Computerknoten iiber den gemeinsamen Bus; 

30 

Ungultigmachen irgendeiner gultlgen Kopie des zweiten Speicherblocks unter Verwendung des Koharenz- 
transformators in dem Computerknoten; und 

Weiterleiten der zweiten Kopie des zweiten Speicherblocks von dem Koharenztransformator zu dem gemein- 
35 samen Bus, urn einen Erzeuger der Seitenspeicherzugriffsanforderung in die Lage zu versetzen, die zweite 

Kopie des zweiten Speicherblocks zu erhalten; und 

Aufheben der Zuordnung des Schniiff el-Tags des Schniiffel-Tag-Arrays, hierdurch das Schnuff el-Tag verfug- 
bar machend zum Verfolgen anderer extern gecacheter Speicherblocke und Veranlassen, dass der zweite 
40 Speicherblock nicht langer verfolgt wird von dem Schnuff el-Tag- Array. 

14. Verfahren nach Anspruch 11, wobei die zweite Speicherzugriffsanforderung eine Anforderung fur eine geteilte 
Kopie ist und das Schnuffel-Tag, das den zweiten Speicherblock verfolgt, anzeigt, dass die erste Kopie des zweiten 
Speicherblocks bei der externen Einrichtung eine geteilte Kopie des zweiten Speicherblocks ist, wobei der Schritt 
*s des Ansprechens auf die zweite Speicherzugriffsanforderung unter Verwendung der Nur-Schniiffel-Methode um- 

fasst: 

Erhalten einer zweiten Kopie des zweiten Speicherblocks unter Verwendung des Koharenztransformators von 
dem Computerknoten iiber den gemeinsamen Bus; und 

50 

Weiterleiten der zweiten Kopie des zweiten Speicherblocks von dem Koharenztransformator zu dem gemein- 
samen Bus, urn den Erzeuger der zweiten Speicherzugriffsanforderung in die Lage zu versetzen, die zweite 
Kopie des zweiten Speicherblocks zu erhalten. 

55 15. Verfahren nach Anspruch 11, wobei die zweite Speicherzugriffsanforderung eine Anforderung fur eine geteilte 
Kopie ist und das Schnuff el-Tag, das den zweiten Speicherblock verfolgt, anzeigt, dass die erste Kopie des zweiten 
Speicherblocks bei der externen Einrichtung eine ausschlieBliche Kopie des zweiten Speicherblocks ist, wobei 
der Schritt des Ansprechens auf die zweite Speicherzugriffsanforderung unter Verwendung der Nur-Schnuffel- 
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Methode umfasst: 

Erhalten einer zweiten Kopie des zweiten Speicherblocks von der externen Einrichtung unter Verwendung 
des Koharenztransformators; 

Andern des Schn tiff el-Tags, das den zweiten Speicherblock verfolgt zum Anzeigen, dass die erste Kopie des 
Speicherblocks bei der externen Einrichtung eine geteilte Kopie des zweiten Speicherblocks ist; und 

Weiterleiten der zweiten Kopie des zweiten Speicherblocks von dem Koharenztransformator zum gemeinsa- 
men Bus, urn einen Erzeuger der zweiten Speicherzu griff sanforderung in die Lage zu versetzen, die zweiten 
Kopie des zweiten Speicherblocks zu erhalten. 

16. Koharenztransformator (200) zum Erteichtern des Teilens von Speicherblocken (112) zwischen einem Computer- 
knoten (100) und einer externen Einrichtung, wobei der Computerknoten einen gemeinsamen Bus (108) ein- 
schlieGt, mit dem der Koharenztransformator, ein Speichermodul (110) und ein Verarbeitungsknoten (102, 104, 
106) mit einem Prozessor und einem Cache (114) verbunden sind, wobei die Speicherblocke lokale physikalische 
Adressen in dem Speichermodul an dem Computerknoten haben, jeder der Speicherblocke ein zugeordnetes 
Speicheretikett bzw. Mtag zum Verfolgen eines globalen Zustandes hat, der jedem Speicherblock zugeordnet ist, 
einschlieBlich eines globalen ausschlieBlichen Zustandes zum Anzeigen, dass der Speicherblock ausschlieBlich 
fur den Computerknoten ist, eines globalen geteilten Zustandes zum Anzeigen, dass der Speicherblock geteilt ist 
durch den Computerknoten mit der externen Einrichtung und eines globalen Ungultigkeits-Zustandes zum Anzei- 
gen, dass der jeweilige Speicherblock ungultig ist in dem Computerknoten, wobei der Koharenztransformator um- 
fasst: 

eine Schnuffel-Logik (260), die konfiguriert ist zum Koppeln mit dem gemeinsamen Bus des Compute rknotens, 
wobei die Schnuffel-Logik, wenn sie mit dem gemeinsamen Bus gekoppelt ist, betreibbar ist zum Uberwachen 
von Speicherzugriffsanforderungen auf dem gemeinsamen Bus; 

wobei der Koharenztransformator gekennzeichnet ist durch das Umfassen: 

eines Schnuff el -Tag- Arrays (250), das mit der Schnuffel-Logik gekoppelt ist, wobei das Schnuff el-Tag-Array 
eine Vielzahl von Schnuff el -Tags (273, 274, 276, 278, 280) hat, jedes der Vielzahl von Schnuff el-Tags konfi- 
guriert ist zum Identifizieren eines der Speicherblocke, wenn er von der externen Einrichtung gecached ist 
und zum Verfolgen eines externen Zustandes einer Kopie des Speicherblocks, wobei der externe Zustand 
einen einschlieBtvon einem externen AusschlieBlichkeits-Zustand zum Anzeigen, dass die Kopie dieses Spei- 
cherblocks ausschlie(3lich fur die externe Domain ist, eines externen geteilten Zustandes zum Anzeigen, dass 
die Kopie dieses Speicherblocks geteilt ist von der externen Domain und eines externen Ungultigkeits-Zu- 
standes zum Anzeigen, dass die Kopie dieses Speicherblocks ungultig ist in der externen Domain; und 

eine Logikvorrichtung zum Ermitteln (504, 506, 508), ob eine erste Speicherzugriffsanforderung von der ex- 
ternen Einrichtung zum Cachen eines ersten Speicherblocks beantworten werden sollte unter Verwendung 
einer Nur-Schniiffel-Methode, in der ein Tag in dem Schnuff el-Tag- Array betreibbar ist zum Verfolgen des 
externen Zustandes einer Kopie des ersten Speicherblocks fur eine gesamte Dauer, die der erste Speicher- 
block von der externen Einrichtung gecached ist Oder Verwen dung einer Nur-Mtag- Methode, bei der ein Etikett 
bzw. Tag fur den ersten Speicherblock temporar gespeichert wird, bis ein globaler Zustand, der dem ersten 
Speicherblock zugeordnet ist, zurtickgeschrieben werden kann in das Speichermodul. 

17. Koharenztransformator nach Anspruch 16, auBerdem eine Logik (606, 608, 610) umfassend zum Feststellen, ob 
ein zweiter Speicherzugriff fur einen zweiten Speicherblock auf dem gemeinsamen Bus beantwortet werden sollte 
unter Verwendung der Nur-Schnuffel-Methode oder der Nur-Mtag-Methode, wobei der zweiten Speicherzugriff 
beantwortet wird durch den Koharenztransformator unter Verwendung der Nur-Schniiffel-Methode, wenn der zwei- 
te Speicherblock von dem Schnuffel-Tag-Array verfolgt wird, wobei durch das Speichermodul auf den zweiten 
Speicherblock angesprochen wird unter Verwendung der Nur-Mtag-Methode, wenn der zweite Speicherzugriff 
nicht von dem Schnuffel-Tag-Array verfolgt wird. 

18. Computersystem mit einem Computerknoten (100), dem Koharenztransformator (200) nach Anspruch 16 oder 
Anspruch 17 und einer externen Einrichtung, wobei der Computerknoten einen gemeinsamen Bus (108) ein- 
schlieBt, an den der Koharenztransformator (200), ein Speichermodul (110) und ein Verarbeitungsknoten (102, 
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104, 106) mit einem Prozessor (116) und einem Cache (114) verbunden sind. 
Revendications 

5 

1. Procede pour permettre a un dispositif externe (202, 204, 206) dans un domaine externe qui est exteme a un 
noeud informatique (1 00) d'un systeme informatique, de partager des blocs de memoire (112) ayant des adresses 
physiques locales dans un module de memoire (110) audit noeud informatique independamment du fait que ledit 
dispositif exteme et le bus commun (108) audit noeud informatique utilisent tous deux un protocole commun et 

10 independamment du fait que ledit dispositif exteme et ledit bus commun operent tous deux a la m§me Vitesse, 

ledit noeud informatique comportant un transform ateur de coherence (200), ledit module de memoire et un noeud 
de traitement connecte audit bus commun, ledit noeud de traitement (102, 104, 106) ayant un processeur (116) 
et un cache (114), chacun desdits blocs de memoire ayant une balise de memoire associee (Mtag) pour pister un 
etat global associe a chaque bloc memoire, incluant un etat exclusif global pour indiquer que chaque bloc memoire 

*5 est exclusif audit noeud informatique, a Petat partage global pour indiquer que chaque bloc memoire est partage 

par ledit noeud informatique et ledit dispositif externe, et un etat invalide global pour indiquer que chaque bloc 
memoire est invalide dans ledit noeud informatique, ledit proc6de comprenant les Stapes consistant a : 

mettre sous surveillance de trafic ledit bus commun afin de controler les demandes d'acces a la memoire sur 
20 ledit bus commun ; 

recevoir, au transformateur de coherence, une premiere demande d'acces a la memoire pour mettre en an- 
t6m6moire un premier bloc de memoire a partir dudit dispositif exteme ; 

obtenir une premiere copie dudit bloc de memoire, utiliser ledit transformateur de coherence, a partir dudit 
bus commun, caracterise en ce que ledit transformateur de coherence comporte un tableau de balises de 

25 surveillance de trafic (250) ayant une plurality de balises de surveillance de trafic, chacune de ladite plurality 

de balises de surveillance de trafic etant configured pour identifier un desdits blocs de memoire si ce dernier 
est mis en antememoire par ledit dispositif externe et pour pister un etat exteme d'une copie de ce bloc de 
memoire, ledit etat externe incluant run d'un etat exclusif exteme pour indiquer que ladite copie de ce bloc de 
memoire est exclusif audit domaine exteme, d'un etat partage externe pour indiquer que ladite copie de ce 

30 bloc de memoire est partagee par ledit domaine externe, et d'un etat invalide externe pour indiquer que ladite 

copie de ce bloc de memoire est invalide dans ledit domaine externe ; et 

si au moins une balise de ladite pluralite de balises de surveillance de trafic est dlsponible pour pister ledit 
etat externe de ladite premiere copie dudit premier bloc de memoire, repondre a ladite premiere demande 
d'acces a la memoire en utilisant une approche par surveillance de trafic seulement dans laquelle ladite balise 

35 est utilisee pour pister ledit etat externe de ladite premiere copie dudit premier bloc de memoire pour une 

duree entiere, que ledit premier bloc de memoire est mis en antem6moire par ledit dispositif externe ; 
autrement, si au moins une des balises de ladite plurality de balises de surveillance de trafic n'est pas dispo- 
nible pour pister ledit etat externe de ladite premiere copie dudit premier bloc de memoire, repondre a ladite 
premiere demande d'acces a la memoire en utilisant une approche par balise de m6moire (Mtag) seulement 

40 dans laquelle, en utilisant ledit transformateur de coherence, une balise pour ledit premier bloc de memoire 

est temporairement stockee jusqu'a ce qu'un stade global associe audit premier bloc de memoire puisse etre 
reinscrit dans ledit module de memoire ; 

ladite premiere copie dudit premier bloc de memoire etant envoyee dudit transformateur de coherence audit 
dispositif externe. 

45 

2. Proced6 selon la revendication 1 , dans lequel ladite premiere demande d'acces a la memoire a partir dudit dispositif 
externe represente une demande pour une copie exclusive dudit premier bloc de memoire et ladite etape de 
reponse a ladite premiere demande d'acces a la memoire utilisant ladite approche par balise de memoire seulement 
comporte de plus une etape consistant a changer ladite premiere balise de memoire dans ledit module de memoire 

so en un etat invalide global. 

3. Procede selon la revendication 2, dans lequel ladite etape de reponse a ladite premiere demande d'acces a la 
memoire utilisant ladite approche par balise de memoire seulement comporte de plus une etape d'invalidation de 
toutes les copies valides dudit premier bloc de memoire audit noeud informatique. 



55 



4. Procede selon la revendication 1 , dans lequel ladite premiere demande d'acces a la memoire a partir dudit dispositif 
externe represente soit une demande pour une copie exclusive dudit premier bloc de memoire, soit une demande 
pour une copie partagee dudit premier bloc de memoire, I'etape de reponse a ladite premiere demande d'acces 
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a la memoire utilisant ladlte approche par balise de memoire seulement comporte de plus les etapes consistant a : 

avant ladite etape de modification, examiner ladite premiere balise de memoire (Mtag) associee audit premier 
bloc de memoire; et 

5 ne proceder a ladite etape de modification et a ladite etape d'envoi que si la premiere balise de memoire 

(Mtag) ne represente pas un etat invalide global. 

5. Procede selon la revendication 1 , dans lequel ladite premiere demande d'acces a la memoire a partir dudit dispositif 
externe represente une demande pour une copie partagee dudit premier bloc de memoire et ladite etape de re- 

10 ponse a ladite premiere demande d'acces a la memoire utilisant ladite approche par balise de memoire seulement 

comporte de plus une etape de modification de ladite premiere balise de memoire dans ledit module de memoire 
en un etat partage global. 

6. Procede selon la revendication 5, dans lequet ladite etape de reponse a ladite premiere demande d'acces a la 
15 memoire utilisant ladite approche par balise de memoire seulement comporte de plus les etapes consistant a : 

avant ladite etape de modification, examiner ladite premiere balise de memoire (Mtag) associee audit premier 
bloc de memoire; et 

ne proceder a ladite etape de modification et a ladite etape d'envoi que si ladite premiere balise de memoire 
20 (Mtag) ne represente pas un etat invalide global ; et 

si ladite premiere balise de memoire (Mtag) represente un etat invalide global, signaler une condition d'erreur. 

7. Procede selon la revendication 1 , comprenant de plus les etapes consistant a : 

25 recevoir une demande de ^inscription pour un second bloc de memoire a partir dudit dispositif externe audit 

transformateur de coherence ; 

obtenir ladite premiere copie dudit second bloc de memoire, en utilisant ledit transformateur de coherence a 
partir dudit dispositif externe ; 

ecrire ladite premiere copie dudit second bloc de memoire a partir dudit transformateur de coherence vers 

30 ledit module de memoire audit noeud informatique ; et 

si ladite premiere copie dudit premier bloc de memoire n'est pas pistee dans une balise de surveillance de 
trafic dudit tableau de balises de surveillance de trafic, modifier, en utilisant ledit transformateur de coherence, 
une balise de memoire (Mtag) associee audit second bloc de memoire dans ledit module de memoire audit 
noeud informatique pour refleter le fait que ledit noeud informatique a une copie exclusive dudit second bloc 

35 de memoire. 

8. Procede selon la revendication 1, dans lequel ledit etat global pour ledit chacun desdits blocs de memoire est 
employe en tant que ledit etat externe pour ledit chacun desdits blocs de memoire, de maniere qu'un etat exclusif 
global represente un etat invalide externe, un etat partage global represente un etat partage externe, et un etat 

^0 invalide global represente un etat exclusif externe. 

9. Proc6d6 selon la revendication 1 , comprenant de plus les etapes consistant a : 

recevoir une demande de reinscription pour ledit premier bloc de memoire a partir dudit dispositif externe audit 
45 transformateur de coherence ; 

obtenir ladite premiere copie dudit premier bloc de memoire, en utilisant le transformateur de coherence, a 
partir dudit dispositif externe ; 

ecrire ladite premiere copie dudit premier bloc de memoire a partir dudit transformateur de coherence vers 
ledit module de memoire audit noeud informatique ; et 

50 si ladite premiere copie dudit premier bloc de memoire a ete pistee dans une balise de surveillance de trafic 

dudit tableau de balises de surveillance de trafic avant ladite etape d'ecriture, desallouer ladite balise de sur- 
veillance de trafic dudit tableau de balises de surveillance de trafic, de maniere a rendre ladite balise de 
surveillance de trafic disponible pour le pistage d'autres blocs de memoire mis en antememoire exterieurement 
et faire en sorte que ladite premiere copie dudit premier bloc de memoire ne soit plus pistee par ledit tableau 

55 de balises de surveillance de trafic. 

10. Procede selon la revendication 1 , comprenant de plus I'etape de reponse, par ledit transformateur de coherence, 
a une seconde demande d'acces a la memoire sur ledit bus commun de la part dudit dispositif externe, comprenant 
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les etapes consistant a : 

contrdier les demandes d'acces a la memoire sur ledit bus commun en utilisant ledit transformateur de cohe- 
rence, pour determiner si une seconde demande d'acces a la memoire desdites demandes d'acces a la me- 
5 moire sur ledit bus commun concerne I'un quelconque des blocs de memoire piste dans les balises de sur- 

veillance de trafic dudit tableau de balises de surveillance de trafic ; et 

si ladite seconde demande d'acces a la memoire concerne un second bloc de memoire, ledit second bloc de 
memoire representant ledit un des blocs de memoire piste dans lesdites balises de surveillance de trafic dudit 
tableau de balises de surveillance de trafic, repondre a ladite seconde demande d'acces a la memoire en 
10 utilisant ladite approche par balise de surveillance de trafic seulement, y compris repondre a ladite seconde 

demande d'acces a la memoire en utilisant ledit transformateur de coherence. 

11. Procede selon la revendication 10, dans lequel ledit transformateur de coherence ne repond a ladite seconde 
demande d'acces a la memoire dans ladite approche par balise de surveillance de trafic seulement, que si une 

15 balise de surveillance de trafic pistant ledit second bloc de memoire dans ledit tableau de balises de surveillance 

de trafic indique qu'une premiere copie dudit second bloc de memoire est valide audit dispositif externe. 

12. Procede selon la revendication 11 , dans lequel ladite seconde demande d'acces a la memoire est une demande 
pour une copie exclusive et ladite balise de surveillance de trafic pistant ledit second bloc de memoire indique que 

20 ladite premiere copie dudit second bloc de memoire audit dispositif externe est une copie exclusive dudit second 

bloc de memoire, ladite etape de reponse a ladite seconde demande d'acces a la memoire comprend les etapes 
consistant a : 

obtenir, en utilisant le transformateur de coherence, une seconde copie dudit second bloc de memoire a partir 
25 de ladite premiere copie dudit second bloc de memoire audit dispositif externe ; 

invalider ladite premiere copie dudit second bloc de memoire de memoire audit dispositif externe ; et 
reacheminer ladite seconde copie dudit second bloc de memoire depuis ledit transformateur de coherence 
vers ledit bus commun pour permettre a u n initiateur de ladite seconde demande d'acces a la memoire d'obtenir 
ladite seconde copie dudit second bloc de memoire ; et 
30 desallouer ladite balise de surveillance de trafic dudit tableau de balises de surveillance de trafic, de maniere 

a rendre ladite balise de surveillance de trafic disponible pour le pistage d'autres blocs de memoire mis en 
antememoire exterieurement et pour faire en sorte que ledit second bloc de memoire ne soit plus piste par 
ledit tableau de balises de surveillance de trafic. 

35 13. Procede selon la revendication 11 , dans lequel ladite seconde demande d'acces a la memoire est une demande 
pour une copie exclusive et ladite balise de surveillance de trafic pistant ledit second bloc de memoire indique que 
ladite premiere copie dudit second bloc de memoire audit dispositif externe est une copie partagee dudit second 
bloc de memoire, ladite etape de reponse a ladite seconde demande d'acces a la memoire utilisant ladite approche 
par balise de surveillance de trafic seulement, comprenant les etapes consistant a : 

40 

invalider ladite premiere copie dudit second bloc de memoire audit dispositif externe ; 

obtenir, en utilisant le transformateur de coherence, une seconde copie dudit second bloc de m6moire a partir 

dudit noeud informatique via ledit bus commun ; 

invalider, en utilisant ledit transformateur de coherence, toute copie valide dudit second bloc de memoire dans 
45 ledit noeud informatique ; et 

reacheminer ladite seconde copie dudit second bloc de memoire depuis ledit transformateur de coherence 
vers ledit bus commun pour autoriser un initiateur de ladite seconde demande d'acces a la memoire a obtenir 
ladite seconde copie dudit second bloc de memoire ; et 

desallouer ladite balise de surveillance de trafic dudit tableau de balises de surveillance de trafic, de maniere 
50 a rendre ladite balise de surveillance de trafic disponible pour le pistage d'autres blocs de memoire mis en 

antememoire exterieurement, et pour faire en sorte que ledit second bloc de memoire ne soit plus piste par 
ledit tableau de balises de surveillance de trafic. 

14. Procede selon la revendication 11, dans lequel ladite seconde demande d'acces a la memoire est une requete 
55 pour une copie partagee et ladite balise de surveillance de trafic pistant ledit second bloc de memoire indique que 

ladite premiere copie dudit second bloc de memoire audit dispositif externe est une copie partagee dudit second 
bloc de memoire, ladite etape de reponse a ladite seconde demande d'acces a la memoire utilisant ladite approche 
par balise de surveillance de trafic seulement, comprenant les etapes consistant a : 
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obtenir, en utilisant ledit transformateur de coherence, une seconde copie dudit second bloc de memoire a 
partir dudit noeud informatique via ledit bus commun ; et 

reacheminer ladite seconde copie dudit second bloc de memoire depuis ledit transformateur de coherence 
vers ledit bus commun pour autoriser un initiateur de ladite seconde demande d'acces a la memoire a obtenir 
5 ladite seconde copie dudit second bloc de memoire. 

15. Procede selon la revendication 11, dans lequel ladite seconde demande d'acces a la memoire est une requete 
pour une copie partagee et ladite balise de surveillance de trafic pistant ledit second bloc de memoire indique que 
ladite premiere copie dudit second bloc de memoire audit dispositif externe est une copie exclusive dudit second 

10 bloc de memoire, ladite etape de reponse a ladite seconde demande d'acces a la memoire utilisant ladite approche 

par balise de surveillance de trafic seulement, comprenant les etapes consistant a : 

obtenir, en utilisant ledit transformateur de coherence, une seconde copie dudit second bloc de memoire a 
partir dudit dispositif externe ; 

15 changer ladite balise de surveillance de trafic pistant ledit second bloc de memoire pour indiquer que ladite 

premiere copie dudit second bloc de memoire audit dispositif externe est une copie partagee dudit second 
bloc de memoire ; et 

envoyer ladite seconde copie dudit second bloc de memoire depuis ledit transformateur de coherence vers 
ledit bus commun afin d'autoriser un initiateur de ladite seconde demande d'acces a la memoire a obtenir 
20 ladite seconde copie dudit second bloc de memoire. 

16. Transformateur de coherence (200) pour faciliter le partage de blocs de memoire (112) entre un noeud informatique 
(100) et un dispositif externe, ledit noeud informatique comportant un bus commun (108) auquel ledit transforma- 
teur de coherence, un module de memoire (110) et un noeud detraitement (102, 104, 106) avec un processeur 

25 et un cache (1 1 4) sont connectes, lesdits blocs de memoire ayant des adresses physiques locales dans le module 

de memoire audit noeud informatique, chacun desdits blocs de m6moire ayant une balise de memoire (Mtag) 
associee pour pister un etat global associe a chaque bloc memoire, incluant un etat exclusif global pour indiquer 
que ce bloc memoire est exclusif audit noeud informatique, un etat partage global pour indiquer que le bloc memoire 
est partage par ledit noeud informatique et ledit dispositif externe, et un etat invalide global pour indiquer que 

30 chaque bloc memoire est invalide dans ledit noeud informatique, ledit transformateur de coherence comprenant : 

une logique de surveillance de trafic (260) configuree pour etre couplee audit bus commun dudit noeud infor- 
matique, ladite logique de surveillance de trafic, lorsqu'elle est couplee audit bus commun, pouvant etre mise 
en oeuvre pour controler des demandes d'acces a la memoire sur ledit bus commun ; 
35 ledit transformateur de coherence etant caracterlse en ce qu'il comprend : 

un tableau de balises de surveillance de trafic (250) couple a ladite logique de surveillance de trafic, ledit 
tableau de balises de surveillance de trafic ayant une pluralite de balises de surveillance de trafic (273, 
274, 276, 278, 280), chacune de ladite pluralite de balises de surveillance de trafic etant configuree pour 

40 identifier un desdits blocs de memoire s'il est mis en antememoire par ledit dispositif externe, et pour 

pister un etat externe d'une copie de ce bloc de memoire, ledit etat externe incluant un d'un etat exclusif 
externe pour indiquer que ladite copie de ce bloc memoire est exclusive audit domaine externe, d'un etat 
partage externe pour indiquer que ladite copie de ce bloc de memoire est partagee par ledit domaine 
externe, et d'un etat invalide externe pour indiquer que ladite copie de ce bloc de memoire est invalide 

45 dans ledit domaine externe ; et 

des moyens logiques pour determiner (504, 506, 508) si une premiere demande d'acces a la memoire a 
partir dudit dispositif externe pour mettre en antememoire un premier bloc de memoire doit etre I'objet 
d'une reponse en utilisant une approche par balise de surveillance de trafic seulement dans laquelle une 
balise dudit tableau de balises de surveillance de trafic peut etre mise en oeuvre pour pister ledit etat 

so externe d'une copie dudit premier bloc de memoire pour une duree entiere, selon laquelle ledit premier 

bloc de memoire est mis en antememoire par ledit dispositif externe, ou en utilisant une approche par 
balise de memoire (Mtag) seulement dans laquelle une balise pour ledit premier bloc de memoire est 
temporairement stockee jusqu'a ce qu'un etat global associe audit premier bloc de memoire peut etre 
reinscrit dans ledit module de memoire. 

55 

17. Transformateur de coherence selon la revendication 16, comprenant de plus une logique (606, 608, 610) pour 
determiner si un second acces a la memoire pour un second bloc de memoire sur ledit bus commun doit faire 
I'objet d'une reponse en utilisant ladite approche par balise de surveillance de trafic seulement ou ladite approche 
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par balise de memoire (Mtag) seulement, ledit second acces a la memoire faisant I'objet d'une reponse par ledit 
transformateur de coherence en utilisant iadite approche par balise de surveillance de trafic seulement lorsque 
tedit second bloc de memoire est piste par ledit tableau de balises de surveillance de trafic, ledit second bloc de 
memoire faisant I'objet d'une rSponse par ledit module de memoire en utilisant Iadite approche par balise de me- 
moire (Mtag) seulement lorsque ledit second acces a la memoire n'est pas piste par ledit tableau de balises de 
surveillance de trafic. 

18. Systeme informatique comportant un noeud informatique (100), ledit transformateur de coherence (200) selon la 
revendication 16 ou la revendication 17 est un dispositif externe, ledit noeud informatique comportant un bus 
commun (108) auquel ledit transformateur de coherence (200), un module de memoire (110) et un noeud de 
traitement (102, 104, 106) avec un processeur (116) et un cache (114) sont connectes. 
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